Hacker News
- How to train large models on many GPUs? (2021) https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
Linking pages
- What We Know About LLMs (Primer) https://willthompson.name/what-we-know-about-llms-primer 164 comments
- The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- What are Diffusion Models? | Lil'Log https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ 18 comments
- Visualizing 6D Mesh Parallelism · main https://main-horse.github.io/posts/visualizing-6d/ 3 comments
- Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
- Everything about Distributed Training and Efficient Finetuning | Sumanth's Personal Website https://sumanthrh.com/post/distributed-and-efficient-finetuning/ 1 comment
- Learning with not Enough Data Part 1: Semi-Supervised Learning | Lil'Log https://lilianweng.github.io/posts/2021-12-05-semi-supervised/ 0 comments
- Pipeline-Parallelism: Distributed Training via Model Partitioning https://siboehm.com/articles/22/pipeline-parallel-training 0 comments
Linked pages
- The world’s fastest framework for building websites |Hugo http://gohugo.io/ 396 comments
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- [2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
- Coefficient of variation - Wikipedia https://en.wikipedia.org/wiki/Coefficient_of_variation 21 comments
- What are Diffusion Models? | Lil'Log https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ 18 comments
- https://arxiv.org/abs/2101.03961 4 comments
- [1710.03740] Mixed Precision Training https://arxiv.org/abs/1710.03740 1 comment
- [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
- [1811.06965] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism https://arxiv.org/abs/1811.06965 0 comments
- Learning with not Enough Data Part 1: Semi-Supervised Learning | Lil'Log https://lilianweng.github.io/posts/2021-12-05-semi-supervised/ 0 comments
- [1604.06174] Training Deep Nets with Sublinear Memory Cost https://arxiv.org/abs/1604.06174 0 comments
Related searches:
Search whole site: site:lilianweng.github.io
Search title: How to Train Really Large Models on Many GPUs? | Lil'Log
See how to search.