Hacker News
- Language models cost much more in some languages than others https://blog.yenniejun.com/p/all-languages-are-not-created-tokenized 171 comments
Linking pages
Linked pages
- Duolingo Max Uses OpenAI’s GPT-4 For New Learning Features https://blog.duolingo.com/duolingo-max/ 244 comments
- Language family - Wikipedia http://en.wikipedia.org/wiki/Language_family 202 comments
- OpenAI Platform https://platform.openai.com/tokenizer 175 comments
- Why is GPT-3 15.77x more expensive for certain languages? | by Denys Linkov | Medium https://denyslinkov.medium.com/why-is-gpt-3-15-77x-more-expensive-for-certain-languages-2b19a4adc4bc 149 comments
- List of languages by total number of speakers - Wikipedia https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers#Ethnologue_(2022,_25th_edition) 129 comments
- GitHub - openai/tiktoken https://github.com/openai/tiktoken 74 comments
- Amazon releases 51-language dataset for language understanding - Amazon Science https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding 49 comments
- Yokosuka becomes Japan's first city to use ChatGPT for administrative tasks | The Japan Times https://www.japantimes.co.jp/news/2023/04/20/national/chatgpt-yokosuka-trial/ 18 comments
- Usage Statistics and Market Share of Content Languages for Websites, October 2022 https://w3techs.com/technologies/overview/content_language 11 comments
- WALS Online - Home http://wals.info/ 4 comments
- https://www.similarweb.com/website/chat.openai.com 3 comments
- Summary of the tokenizers https://huggingface.co/docs/transformers/tokenizer_summary 1 comment
- Earth mover's distance - Wikipedia https://en.wikipedia.org/wiki/Earth_mover%27s_distance 0 comments
- W3Techs - extensive and reliable web technology surveys http://w3techs.com/ 0 comments
Related searches:
Search whole site: site:blog.yenniejun.com
Search title: All languages are NOT created (tokenized) equal
See how to search.