discu
Newsletters
Mentions
Extension
Pricing
Login
Sign Up
Reddit
Tensor and Fully Sharded Data Parallelism - How Trillion Parameter Models Are Trained
https://martynassubonis.substack.com/p/tensor-and-fully-sharded-data-parallelism
4 comments
19/1/2025
mlquestions