Linking pages
- GitHub - microsoft/BlingFire: A lightning fast Finite State machine and REgular expression manipulation library. https://github.com/microsoft/blingfire 92 comments
- GitHub - llSourcell/DoctorGPT: DoctorGPT is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private. https://github.com/llSourcell/DoctorGPT 75 comments
- GitHub - niedev/RTranslator: Open source real-time translation app for Android that runs locally https://github.com/niedev/RTranslator 64 comments
- Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments - Lightning AI https://lightning.ai/pages/community/lora-insights/ 39 comments
- GitHub - google-research/bert: TensorFlow code and pre-trained models for BERT https://github.com/google-research/bert 21 comments
- GitHub - facebookresearch/LASER: Language-Agnostic SEntence Representations https://github.com/facebookresearch/LASER 11 comments
- spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2 · Explosion https://explosion.ai/blog/spacy-pytorch-transformers 11 comments
- How the BPE tokenization algorithm used by large language models works. | sidsite https://sidsite.com/posts/bpe/ 11 comments
- Towards an ImageNet Moment for Speech-to-Text https://thegradient.pub/towards-an-imagenet-moment-for-speech-to-text/ 10 comments
- GitHub - maziarraissi/Applied-Deep-Learning: Applied Deep Learning Course https://github.com/maziarraissi/Applied-Deep-Learning 6 comments
- ML-Enhanced Code Completion Improves Developer Productivity – Google AI Blog https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html 3 comments
- GitHub - VKCOM/YouTokenToMe: Unsupervised text tokenizer focused on computational efficiency https://github.com/VKCOM/YouTokenToMe 3 comments
- Overview of tokenization algorithms in NLP | by Ane Berasategi | Towards Data Science https://towardsdatascience.com/overview-of-nlp-tokenization-algorithms-c41a7d5ec4f9 3 comments
- GitHub - argosopentech/argos-train: Training scripts for Argos Translate https://github.com/argosopentech/argos-train 1 comment
- How we used Universal Sentence Encoder and FAISS to make our search 10x smarter | by Maxim Leonovich | OneBar https://blog.onebar.io/building-a-semantic-search-engine-using-open-source-components-e15af5ed7885 1 comment
- GitHub - amrzv/awesome-colab-notebooks: Collection of google colaboratory notebooks for fast and easy experiments https://github.com/amrzv/awesome-colab-notebooks 0 comments
- spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2 · Explosion https://explosion.ai/blog/spacy-transformers 0 comments
- GitHub - ml-tooling/best-of-ml-python: 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly. https://github.com/ml-tooling/best-of-ml-python 0 comments
- Subword Tokenization - Handling Misspellings and Multilingual Data - the Thought Vector blog - Blog Vector https://www.thoughtvector.io/blog/subword-tokenization/ 0 comments
- Advances in Semantic Textual Similarity – Google AI Blog https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html 0 comments
Linked pages
- [1609.08144] Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation http://arxiv.org/abs/1609.08144 97 comments
- GitHub - microsoft/vcpkg: C++ Library Manager for Windows, Linux, and MacOS https://github.com/microsoft/vcpkg 50 comments
- MeCab: Yet Another Part-of-Speech and Morphological Analyzer https://taku910.github.io/mecab/ 15 comments
- Apache License, Version 2.0 – Open Source Initiative https://opensource.org/licenses/Apache-2.0 6 comments
- CMake - Upgrade Your Software Build System https://cmake.org/ 4 comments
- SLSA • Supply-chain Levels for Software Artifacts http://slsa.dev/ 3 comments
- [1804.10959] Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates https://arxiv.org/abs/1804.10959 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation.
See how to search.