[1604.06174] Training Deep Nets with Sublinear Memory Cost

Linking pages

How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
GitHub - cybertronai/gradient-checkpointing: Make huge neural nets fit in memory https://github.com/openai/gradient-checkpointing 3 comments
Local Large Language Models - beginners guide - int8.io int8.io https://int8.io/local-large-language-models-beginners-guide/ 2 comments
OneFlow Made Training GPT-3 Easier（Part 1） | by OneFlow | Medium https://oneflow2020.medium.com/oneflow-made-training-gpt-3-easier-part-1-5b6b65d70d3c 1 comment
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR https://nv-adlr.github.io/MegatronLM 1 comment
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
Scaling to trillion-parameter model training on AWS - Amazon Science https://www.amazon.science/blog/scaling-to-trillion-parameter-model-training-on-aws 0 comments
Training larger-than-memory PyTorch models using gradient checkpointing https://spell.ml/blog/gradient-checkpointing-pytorch-YGypLBAAACEAefHs 0 comments
Waybackprop https://magenta.tensorflow.org/blog/2017/06/01/waybackprop/ 0 comments
Constructing Transformers For Longer Sequences with Sparse Attention Methods – Google AI Blog https://ai.googleblog.com/2021/03/constructing-transformers-for-longer.html 0 comments