Hacker News
Linking pages
- All languages are NOT created (tokenized) equal https://blog.yenniejun.com/p/all-languages-are-not-created-tokenized 171 comments
- GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision https://github.com/openai/whisper/ 126 comments
- GitHub - PyO3/pyo3: Rust bindings for the Python interpreter https://github.com/PyO3/PyO3 118 comments
- You Should Probably Pay Attention to Tokenizers - Cybernetist https://cybernetist.com/2024/10/21/you-should-probably-pay-attention-to-tokenizers/ 94 comments
- Giving GPT "Infinite" Knowledge - by Samir Khoja https://sudoapps.substack.com/p/giving-gpt-infinite-knowledge 86 comments
- GitHub - karpathy/minbpe: Minimal, clean, educational code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. https://github.com/karpathy/minbpe 31 comments
- How the BPE tokenization algorithm used by large language models works. | sidsite https://sidsite.com/posts/bpe/ 11 comments
- GitHub - taishi-i/awesome-ChatGPT-repositories: A curated list of resources dedicated to open source GitHub repositories related to ChatGPT https://github.com/taishi-i/awesome-ChatGPT-repositories 5 comments
- GitHub - simonw/ttok: Count and truncate text based on tokens https://github.com/simonw/ttok 5 comments
- GitHub - adalkiran/llama-nuts-and-bolts: A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation. https://github.com/adalkiran/llama-nuts-and-bolts 5 comments
- Announcing pg_tiktoken: A Postgres Extension for Fast BPE Tokenization - Neon https://neon.tech/blog/announcing-pg_tiktoken-a-postgres-extension-for-fast-bpe-tokenization 2 comments
- How well does ChatGPT speak Japanese? https://www.passaglia.jp/gpt-japanese/ 2 comments
- GitHub - yagil/tokmon: CLI to monitor OpenAI token usage. Put `tokmon` before the name of your program to measure your token cost. https://github.com/yagil/tokmon 1 comment
- Counting OpenAI tokens • Harry Marr https://hmarr.com/blog/counting-openai-tokens/ 1 comment
- Meta Llama 3 available on Cloudflare Workers AI https://blog.cloudflare.com/meta-llama-3-available-on-cloudflare-workers-ai 1 comment
- Tokens for LLMs: Byte Pair Encoding in Go - Eli Bendersky's website https://eli.thegreenplace.net/2024/tokens-for-llms-byte-pair-encoding-in-go/ 1 comment
- Five years of progress in GPTs - by Finbarr Timbers https://finbarrtimbers.substack.com/p/five-years-of-progress-in-gpts 0 comments
- GitHub - 1223423/statGPT: Simple plugin to instantly transform broken English into working code and analysis in RStudio using Open AI's API https://github.com/1223423/statGPT 0 comments
- GitHub - nlpfromscratch/nlp-llms-resources: Master list of curated resources on NLP and LLMs https://github.com/nlpfromscratch/nlp-llms-resources 0 comments
- How to represent a protein sequence | Liam's Blog https://liambai.com/protein-representation/ 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - openai/tiktoken
See how to search.