[2203.15556] Training Compute-Optimal Large Language Models

Linking pages

Google "We Have No Moat, And Neither Does OpenAI" https://www.semianalysis.com/p/google-we-have-no-moat-and-neither 1571 comments
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance – Google AI Blog https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html 279 comments
AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
Chinchilla data-optimal scaling laws: In plain English – Dr Alan D. Thompson – Life Architect https://lifearchitect.ai/chinchilla/ 151 comments
Check Out This DeepMind's New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks - MarkTechPost https://www.marktechpost.com/2022/04/09/check-out-this-deepminds-new-language-model-chinchilla-70b-parameters-which-significantly-outperforms-gopher-280b-and-gpt-3-175b-on-a-large-range-of-downstream-evaluation-tasks/ 146 comments
The AI War and How to Win It - by Alexandr Wang https://alexw.substack.com/p/war 136 comments
Large Language Models: Scaling Laws and Emergent Properties - Clément Thiriet https://cthiriet.com/articles/scaling-laws 124 comments
My AI Timelines Have Sped Up (Again) https://www.alexirpan.com/2024/01/10/ai-timelines-2024.html 95 comments
Mamba Explained | Kola Ayonrinde https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html 93 comments
Characterizing Emergent Phenomena in Large Language Models – Google AI Blog https://ai.googleblog.com/2022/11/characterizing-emergent-phenomena-in.html 57 comments
Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
Normcore LLM Reads · GitHub https://gist.github.com/veekaybee/be375ab33085102f9027853128dc5f0e 52 comments
Chinchilla’s Death https://espadrine.github.io/blog/posts/chinchilla-s-death.html#Can_Chinchillas_picture_a_Llama_s_sights_ 50 comments
Mamba Explained https://thegradient.pub/mamba-explained/ 44 comments
It Looks Like You’re Trying To Take Over The World · Gwern.net https://www.gwern.net/fiction/Clippy 33 comments
Understanding Large Language Models -- A Transformative Reading List https://sebastianraschka.com/blog/2023/llm-reading-list.html 26 comments
GitHub - google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models. https://github.com/google-research/tuning_playbook 21 comments
NLP Research in the Era of LLMs - by Sebastian Ruder https://nlpnewsletter.substack.com/p/nlp-research-in-the-era-of-llms 17 comments
Transformer Math 101 | EleutherAI Blog https://blog.eleuther.ai/transformer-math/ 13 comments
Efficient LLM inference - by Finbarr Timbers https://www.artfintel.com/p/efficient-llm-inference 11 comments