- Tensor and Fully Sharded Data Parallelism - How Trillion Parameter Models Are Trained https://martynassubonis.substack.com/p/tensor-and-fully-sharded-data-parallelism 1 comment deeplearning
- Tensor and Fully Sharded Data Parallelism - How Trillion Parameter Models Are Trained https://martynassubonis.substack.com/p/tensor-and-fully-sharded-data-parallelism 4 comments mlquestions
- [R] Tensor and Fully Sharded Data Parallelism https://martynassubonis.substack.com/p/tensor-and-fully-sharded-data-parallelism 0 comments machinelearning
- Tensor and Fully Sharded Data Parallelism - How Trillion Parameter Models Are Trained https://martynassubonis.substack.com/p/tensor-and-fully-sharded-data-parallelism 0 comments learnmachinelearning
Linked pages
- Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta - https://engineering.fb.com/2021/07/15/open-source/fsdp/ 2 comments
- [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
- Distributed Data Parallel Training - by Martynas Šubonis https://martynassubonis.substack.com/p/distributed-data-parallel-training 0 comments
- Model and Pipeline Parallelism - MLOps Shenanigans https://martynassubonis.substack.com/p/model-and-pipeline-parallelism 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:martynassubonis.substack.com
Search title: Tensor and Fully Sharded Data Parallelism
See how to search.