[2212.14034] Cramming: Training a Language Model on a Single GPU in One Day - discu.eu

Reddit

[R] Cramming: Training a Language Model on a Single GPU in One Day https://arxiv.org/abs/2212.14034 24 comments 29/12/2022 machinelearning

Linking pages

Notes on training BERT from scratch on an 8GB consumer GPU | sidsite https://sidsite.com/posts/bert-from-scratch/ 67 comments
Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
Understanding Large Language Models -- A Transformative Reading List https://sebastianraschka.com/blog/2023/llm-reading-list.html 26 comments
GitHub - janhq/awesome-local-ai: An awesome repository of local AI tools https://github.com/janhq/awesome-local-ai 3 comments
🎇 Your guide to AI: January 2023 https://nathanbenaich.substack.com/p/your-guide-to-ai-january-2023 1 comment
GitHub - AakashKumarNain/annotated_research_papers: This repo contains annotated research papers that I found really good and useful https://github.com/AakashKumarNain/annotated_research_papers 0 comments
GitHub - JonasGeiping/cramming: Cramming the training of a (BERT-type) language model into limited compute. https://github.com/JonasGeiping/cramming 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2212.14034] Cramming: Training a Language Model on a Single GPU in One Day

See how to search.

Submit link to: