- [R] Cramming: Training a Language Model on a Single GPU in One Day https://arxiv.org/abs/2212.14034 24 comments machinelearning
Linking pages
- Notes on training BERT from scratch on an 8GB consumer GPU | sidsite https://sidsite.com/posts/bert-from-scratch/ 67 comments
- Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
- GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
- Understanding Large Language Models -- A Transformative Reading List https://sebastianraschka.com/blog/2023/llm-reading-list.html 26 comments
- GitHub - janhq/awesome-local-ai: An awesome repository of local AI tools https://github.com/janhq/awesome-local-ai 3 comments
- 🎇 Your guide to AI: January 2023 https://nathanbenaich.substack.com/p/your-guide-to-ai-january-2023 1 comment
- GitHub - AakashKumarNain/annotated_research_papers: This repo contains annotated research papers that I found really good and useful https://github.com/AakashKumarNain/annotated_research_papers 0 comments
- GitHub - JonasGeiping/cramming: Cramming the training of a (BERT-type) language model into limited compute. https://github.com/JonasGeiping/cramming 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2212.14034] Cramming: Training a Language Model on a Single GPU in One Day
See how to search.