Hacker News
Linked pages
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- GitHub - yandex/YaFSDP: YaFSDP: Yet another Fully Sharded Data Parallel https://github.com/yandex/YaFSDP 27 comments
- [2308.00951] From Sparse to Soft Mixtures of Experts https://arxiv.org/abs/2308.00951 3 comments
- Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta - https://engineering.fb.com/2021/07/15/open-source/fsdp/ 2 comments
- Mixture of Experts Explained https://huggingface.co/blog/moe 2 comments
- Demystifying Tensor Parallelism | Robot Chinwag https://robotchinwag.com/posts/demystifying-tensor-parallelism/ 1 comment
- ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters - Microsoft Research https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/ 0 comments
- Tensor Parallelism in Three Levels of Difficulty | Determined AI https://www.determined.ai/blog/tp 0 comments
- [2407.21783] The Llama 3 Herd of Models https://arxiv.org/abs/2407.21783 0 comments
- [Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch - torchtitan - PyTorch Forums https://discuss.pytorch.org/t/distributed-w-torchtitan-introducing-async-tensor-parallelism-in-pytorch/209487 0 comments
Related searches:
Search whole site: site:main-horse.github.io
Search title: Visualizing 6D Mesh Parallelism · main
See how to search.