Hacker News
- How the BPE tokenization algorithm used by large language models works https://sidsite.com/posts/bpe/ 11 comments
Linked pages
- OpenAI Platform https://platform.openai.com/tokenizer 175 comments
- Understanding GPT tokenizers https://simonwillison.net/2023/Jun/8/gpt-tokenizers/ 131 comments
- Trie - Wikipedia https://en.wikipedia.org/wiki/Trie 126 comments
- [2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers https://arxiv.org/abs/2305.07185 94 comments
- GitHub - openai/tiktoken https://github.com/openai/tiktoken 74 comments
- GitHub - huggingface/tokenizers: 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production https://github.com/huggingface/tokenizers 47 comments
- GitHub - google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation. https://github.com/google/sentencepiece 0 comments
Related searches:
Search whole site: site:sidsite.com
Search title: How the BPE tokenization algorithm used by large language models works. | sidsite
See how to search.