How to Train Really Large Models on Many GPUs? | Lil'Log - discu.eu

Hacker News

How to train large models on many GPUs? (2021) https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments 11/2/2023

Linking pages

Linked pages

The world’s fastest framework for building websites |Hugo http://gohugo.io/ 396 comments
[1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
[2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
Coefficient of variation - Wikipedia https://en.wikipedia.org/wiki/Coefficient_of_variation 21 comments
What are Diffusion Models? | Lil'Log https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ 18 comments
https://arxiv.org/abs/2101.03961 4 comments
[1710.03740] Mixed Precision Training https://arxiv.org/abs/1710.03740 1 comment
[2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
[1811.06965] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism https://arxiv.org/abs/1811.06965 0 comments
Learning with not Enough Data Part 1: Semi-Supervised Learning | Lil'Log https://lilianweng.github.io/posts/2021-12-05-semi-supervised/ 0 comments
[1604.06174] Training Deep Nets with Sublinear Memory Cost https://arxiv.org/abs/1604.06174 0 comments
Lil'Log https://lilianweng.github.io/ 0 comments

Related searches:

Search whole site: site:lilianweng.github.io

Search title: How to Train Really Large Models on Many GPUs? | Lil'Log

See how to search.

Submit link to: