Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta - - discu.eu

Hacker News

Fully Sharded Data Parallel: Faster AI Training with Fewer GPUs https://engineering.fb.com/2021/07/15/open-source/fsdp/ 2 comments 16/7/2021

Linking pages

GitHub - openlm-research/open_llama: OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset https://github.com/openlm-research/open_llama 183 comments
GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud. https://github.com/QwenLM/Qwen 51 comments
Transformer Math 101 | EleutherAI Blog https://blog.eleuther.ai/transformer-math/ 13 comments
Tensor and Fully Sharded Data Parallelism https://martynassubonis.substack.com/p/tensor-and-fully-sharded-data-parallelism 5 comments
GitHub - Alpha-VLLM/LLaMA2-Accessory: An Open-source Toolkit for LLM Development https://github.com/Alpha-VLLM/LLaMA2-Accessory 3 comments
Visualizing 6D Mesh Parallelism · main https://main-horse.github.io/posts/visualizing-6d/ 3 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
GitHub - upgundecha/applied-ai: A repository of curated use cases, articles, blogs, videos on how companies are using Artificial Intelligence and Machine Learning. https://github.com/upgundecha/applied-ai 1 comment
Announcing Lightning 1.4. Lightning 1.4 Release adds TPU pods… | by PyTorch Lightning team | PyTorch Lightning Developer Blog https://devblog.pytorchlightning.ai/announcing-lightning-1-4-8cd20482aee9 0 comments
The History of Open-Source LLMs: Early Days (Part One) https://cameronrwolfe.substack.com/p/the-history-of-open-source-llms-early 0 comments
GitHub - stanford-crfm/haliax: Named Tensors for Legible Deep Learning in JAX https://github.com/stanford-crfm/haliax 0 comments
Dolma, OLMo, and the Future of Open-Source LLMs https://cameronrwolfe.substack.com/p/dolma-olmo-and-the-future-of-open 0 comments

Related searches:

Search whole site: site:engineering.fb.com

Search title: Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta -

See how to search.

Submit link to: