Linking pages
Linked pages
- [2404.02258] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models https://arxiv.org/abs/2404.02258 103 comments
- Mark Zuckerberg indicates Meta is spending billions on Nvidia AI chips https://www.cnbc.com/2024/01/18/mark-zuckerberg-indicates-meta-is-spending-billions-on-nvidia-ai-chips.html 71 comments
- [2311.08105] DiLoCo: Distributed Low-Communication Training of Language Models https://arxiv.org/abs/2311.08105 14 comments
- [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
- [2101.06840] ZeRO-Offload: Democratizing Billion-Scale Model Training https://arxiv.org/abs/2101.06840 1 comment
- [2212.13345] The Forward-Forward Algorithm: Some Preliminary Investigations https://arxiv.org/abs/2212.13345 1 comment
- [2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 0 comments
- [2206.01288] Decentralized Training of Foundation Models in Heterogeneous Environments https://arxiv.org/abs/2206.01288 0 comments
- [2104.07857] ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning https://arxiv.org/abs/2104.07857 0 comments
- [2301.11913] SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient https://arxiv.org/abs/2301.11913 0 comments
Related searches:
Search whole site: site:pluralisresearch.substack.com
Search title: Decentralized Training Looms - Pluralis Research
See how to search.