Linking pages
Linked pages
- How Many People Have Ever Lived on Earth? | PRB https://www.prb.org/articles/how-many-people-have-ever-lived-on-earth/ 171 comments
- Common Crawl https://commoncrawl.org/ 85 comments
- RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models https://together.ai/blog/redpajama-data-v2 60 comments
- Software Heritage https://www.softwareheritage.org/ 28 comments
- Introducing Meta Llama 3: The most capable openly available LLM to date https://ai.meta.com/blog/meta-llama-3/ 19 comments
- Meta releases new AI assistant powered by Llama 3 model - The Verge https://www.theverge.com/2024/4/18/24133808/meta-ai-assistant-llama-3-chatgpt-openai-rival 4 comments
- HuggingFaceFW/fineweb · Datasets at Hugging Face https://huggingface.co/datasets/HuggingFaceFW/fineweb 4 comments
- Library of Congress - Wikipedia http://en.wikipedia.org/wiki/library_of_congress 2 comments
- 15 years of Google Books https://www.blog.google/products/search/15-years-google-books/ 0 comments
- GitHub - togethercomputer/RedPajama-Data: The RedPajama-Data repository contains code for preparing large datasets for training large language models. https://github.com/togethercomputer/RedPajama-Data 0 comments
- Anna's Archive - Wikipedia https://en.wikipedia.org/wiki/Anna%27s_Archive 0 comments
- https://aclanthology.org/D07-1090.pdf 0 comments
- TubeStats https://tubestats.org 0 comments
- How U.S. Adults Use TikTok | Pew Research Center https://www.pewresearch.org/internet/2024/02/22/how-u-s-adults-use-tiktok/ 0 comments
- bigcode/the-stack-v2 · Datasets at Hugging Face https://huggingface.co/datasets/bigcode/the-stack-v2 0 comments
Related searches:
Search whole site: site:www.educatingsilicon.com
Search title: How much LLM training data is there, in the limit? – Educating Silicon
See how to search.