Linking pages
Linked pages
- GitHub: Let’s build from here · GitHub https://github.com 3047 comments
- reddit: the front page of the internet https://www.reddit.com/ 2911 comments
- Free eBooks | Project Gutenberg https://gutenberg.org 2028 comments
- GitHub - nomic-ai/gpt4all: gpt4all: a chatbot trained on a massive collection of clean assistant data including code, stories and dialogue https://github.com/nomic-ai/gpt4all 325 comments
- arXiv.org e-Print archive https://arxiv.org/ 312 comments
- The Pile http://pile.eleuther.ai/ 294 comments
- https://arxiv.org/pdf/2212.13138.pdf 209 comments
- [2310.03214] FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation https://arxiv.org/abs/2310.03214 141 comments
- Alpaca Eval Leaderboard https://tatsu-lab.github.io/alpaca_eval/ 132 comments
- Gorilla https://gorilla.cs.berkeley.edu/ 121 comments
- Hot Questions - Stack Exchange http://stackexchange.com/ 111 comments
- [2306.11644] Textbooks Are All You Need https://arxiv.org/abs/2306.11644 106 comments
- Common Crawl https://commoncrawl.org/ 85 comments
- [2101.00027] The Pile: An 800GB Dataset of Diverse Text for Language Modeling https://arxiv.org/abs/2101.00027 81 comments
- Wikimedia Downloads https://dumps.wikimedia.org/ 80 comments
- GitHub - sahil280114/codealpaca https://github.com/sahil280114/codealpaca 63 comments
- How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org https://lmsys.org/blog/2023-06-29-longchat/ 62 comments
- RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models https://together.ai/blog/redpajama-data-v2 60 comments
- Free Dolly: Introducing the World's First Open and Commercially Viable Instruction-Tuned LLM - The Databricks Blog https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm 54 comments
- [2310.10631] Llemma: An Open Language Model For Mathematics https://arxiv.org/abs/2310.10631 46 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - lmmlzn/Awesome-LLMs-Datasets: Summarize existing representative LLMs text datasets.
See how to search.