Hacker News
- 3T Token Open Corpus for Language Model Pretraining https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64 5 comments
Linking pages
- An Industry Insider Drives an Open Alternative to Big Tech’s A.I. - The New York Times https://www.nytimes.com/2023/10/19/technology/allen-institute-open-source-ai.html 2 comments
- AI2 drops biggest open dataset yet for training language models | TechCrunch https://techcrunch.com/2023/08/18/ai2-drops-biggest-open-dataset-yet-for-training-language-models/ 1 comment
- Dolma world's largest free dataset with 3 trillion tokens for LLM training released - KiNews24.de https://kinews24.de/dolma-worlds-largest-free-dataset-with-3-trillion-tokens-for-llm-training-released 0 comments
Related searches:
Search whole site: site:blog.allenai.org
Search title: 3T Token Open Corpus for Language Model Pretraining
See how to search.