Hacker News
- Code for the Byte Pair Encoding algorithm, commonly used in LLM tokenization https://github.com/karpathy/minbpe 31 comments
Linking pages
- GitHub - kuprel/minbpe-pytorch: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA https://github.com/kuprel/minbpe-pytorch 9 comments
- Direct Preference Optimization Explained In-depth https://www.tylerromero.com/posts/2024-04-dpo/ 0 comments
Linked pages
- GitHub - openai/tiktoken https://github.com/openai/tiktoken 74 comments
- GitHub - openai/gpt-2: Code for the paper "Language Models are Unsupervised Multitask Learners" https://github.com/openai/gpt-2 2 comments
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 1 comment