Linking pages
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
- GitHub - cybertronai/gradient-checkpointing: Make huge neural nets fit in memory https://github.com/openai/gradient-checkpointing 3 comments
- Local Large Language Models - beginners guide - int8.io int8.io https://int8.io/local-large-language-models-beginners-guide/ 2 comments
- OneFlow Made Training GPT-3 Easier(Part 1) | by OneFlow | Medium https://oneflow2020.medium.com/oneflow-made-training-gpt-3-easier-part-1-5b6b65d70d3c 1 comment
- MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR https://nv-adlr.github.io/MegatronLM 1 comment
- Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
- Scaling to trillion-parameter model training on AWS - Amazon Science https://www.amazon.science/blog/scaling-to-trillion-parameter-model-training-on-aws 0 comments
- Training larger-than-memory PyTorch models using gradient checkpointing https://spell.ml/blog/gradient-checkpointing-pytorch-YGypLBAAACEAefHs 0 comments
- Waybackprop https://magenta.tensorflow.org/blog/2017/06/01/waybackprop/ 0 comments
- Constructing Transformers For Longer Sequences with Sparse Attention Methods – Google AI Blog https://ai.googleblog.com/2021/03/constructing-transformers-for-longer.html 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [1604.06174] Training Deep Nets with Sublinear Memory Cost
See how to search.