Visualizing 6D Mesh Parallelism · main - discu.eu

Hacker News

Visualizing 6D Mesh Parallelism https://main-horse.github.io/posts/visualizing-6d/ 3 comments 17/12/2024

Linked pages

How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
GitHub - yandex/YaFSDP: YaFSDP: Yet another Fully Sharded Data Parallel https://github.com/yandex/YaFSDP 27 comments
[2308.00951] From Sparse to Soft Mixtures of Experts https://arxiv.org/abs/2308.00951 3 comments
Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta - https://engineering.fb.com/2021/07/15/open-source/fsdp/ 2 comments
Mixture of Experts Explained https://huggingface.co/blog/moe 2 comments
Demystifying Tensor Parallelism | Robot Chinwag https://robotchinwag.com/posts/demystifying-tensor-parallelism/ 1 comment
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters - Microsoft Research https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/ 0 comments
Tensor Parallelism in Three Levels of Difficulty | Determined AI https://www.determined.ai/blog/tp 0 comments
[2407.21783] The Llama 3 Herd of Models https://arxiv.org/abs/2407.21783 0 comments
[Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch - torchtitan - PyTorch Forums https://discuss.pytorch.org/t/distributed-w-torchtitan-introducing-async-tensor-parallelism-in-pytorch/209487 0 comments

Related searches:

Search whole site: site:main-horse.github.io

Search title: Visualizing 6D Mesh Parallelism · main

See how to search.

Submit link to: