[2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Linking pages

How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
What Every User Should Know About Mixed Precision Training in PyTorch | PyTorch https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/ 24 comments
Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
OneFlow Made Training GPT-3 Easier（Part 1） | by OneFlow | Medium https://oneflow2020.medium.com/oneflow-made-training-gpt-3-easier-part-1-5b6b65d70d3c 1 comment
Extrapolating to Unnatural Language Processing with GPT-3’s In-context Learning: The Good, the Bad, and the Mysterious | SAIL Blog https://ai.stanford.edu/blog/in-context-learning/ 1 comment
Rotary Embeddings: A Relative Revolution | EleutherAI Blog https://blog.eleuther.ai/rotary-embeddings/ 1 comment
AI's Carbon Footprint: Understanding and Reducing the Environmental Impact of Large Models https://theaiobserverx.substack.com/p/ais-carbon-footprint-understanding 1 comment
NVIDIA, Stanford & Microsoft Propose Efficient Trillion-Parameter Language Model Training on GPU Clusters | Synced https://syncedreview.com/2021/04/15/nvidia-stanford-microsoft-propose-efficient-trillion-parameter-language-model-training-on-gpu-clusters/ 0 comments
GitHub - tomohideshibata/BERT-related-papers: BERT-related papers https://github.com/tomohideshibata/BERT-related-papers 0 comments
GitHub - PiotrNawrot/nanoT5: Fast & Simple repository for pre-training and fine-tuning T5-style models https://github.com/PiotrNawrot/nanoT5 0 comments
GitHub - Mooler0410/LLMsPracticalGuide: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) https://github.com/Mooler0410/LLMsPracticalGuide 0 comments
GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model https://github.com/Hannibal046/Awesome-LLM 0 comments
Pipeline-Parallelism: Distributed Training via Model Partitioning https://siboehm.com/articles/22/pipeline-parallel-training 0 comments