Linking pages
- What We Know About LLMs (Primer) https://willthompson.name/what-we-know-about-llms-primer 164 comments
- GitHub - kingoflolz/mesh-transformer-jax: Model parallel transformers in JAX and Haiku https://github.com/kingoflolz/mesh-transformer-jax 146 comments
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. https://github.com/EleutherAI/gpt-neox 67 comments
- DeepSpeed/README.md at master · microsoft/DeepSpeed · GitHub https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/README.md 55 comments
- GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud. https://github.com/QwenLM/Qwen 51 comments
- GitHub - punica-ai/punica: Serving multiple LoRA finetuned LLM as one https://github.com/punica-ai/punica 26 comments
- GitHub - linkedin/Liger-Kernel: Efficient Triton Kernels for LLM Training https://github.com/linkedin/Liger-Kernel 19 comments
- Snowflake Arctic - LLM for Enterprise AI https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/ 6 comments
- GitHub - tensorchord/Awesome-LLMOps: An awesome & curated list of best LLMOps tools for developers https://github.com/tensorchord/Awesome-LLMOps 5 comments
- GitHub - Alpha-VLLM/LLaMA2-Accessory: An Open-source Toolkit for LLM Development https://github.com/Alpha-VLLM/LLaMA2-Accessory 3 comments
- GitHub - janhq/awesome-local-ai: An awesome repository of local AI tools https://github.com/janhq/awesome-local-ai 3 comments
- PyTorch Lightning vs DeepSpeed vs FSDP vs FFCV vs … | by William Falcon | Towards Data Science https://william-falcon.medium.com/pytorch-lightning-vs-deepspeed-vs-fsdp-vs-ffcv-vs-e0d6b2a95719 2 comments
- GitHub - OpenGVLab/InternImage: [CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions https://github.com/OpenGVLab/InternImage 2 comments
- GitHub - OpenLLMAI/OpenRLHF: A Ray-based High-performance RLHF framework (for 7B on RTX4090 and 34B on A100) https://github.com/OpenLLMAI/OpenRLHF 2 comments
- OneFlow Made Training GPT-3 Easier(Part 1) | by OneFlow | Medium https://oneflow2020.medium.com/oneflow-made-training-gpt-3-easier-part-1-5b6b65d70d3c 1 comment
- GitHub - ai-forever/ru-gpts: Russian GPT3 models. https://github.com/sberbank-ai/ru-gpts 1 comment
- GitHub - enhuiz/vall-e: An unofficial PyTorch implementation of the audio LM VALL-E, WIP https://github.com/enhuiz/vall-e 1 comment
- DeepSpeedExamples/applications/DeepSpeed-Chat at master · microsoft/DeepSpeedExamples · GitHub https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat 1 comment
- GitHub - deepseek-ai/DeepSeek-Coder: DeepSeek Coder: Let the Code Write Itself https://github.com/deepseek-ai/DeepSeek-Coder 1 comment
- GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
Linked pages
- GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters https://github.com/yandex/YaLM-100B 902 comments
- Turing-NLG: A 17-billion-parameter language model by Microsoft - Microsoft Research https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/ 139 comments
- PyTorch http://pytorch.org/ 100 comments
- 20B-parameter Alexa model sets new marks in few-shot learning - Amazon Science https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning 87 comments
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. https://github.com/EleutherAI/gpt-neox 67 comments
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model - Microsoft Research https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/ 11 comments
- GitHub - THUDM/GLM-130B: GLM-130B: An Open Bilingual Pre-Trained Model https://github.com/THUDM/GLM-130B 1 comment
- [2101.06840] ZeRO-Offload: Democratizing Billion-Scale Model Training https://arxiv.org/abs/2101.06840 1 comment
- Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision - YouTube https://youtu.be/hc0u4avAkuM 0 comments
- Latest News - DeepSpeed https://www.deepspeed.ai/ 0 comments
- The Technology Behind BLOOM Training https://huggingface.co/blog/bloom-megatron-deepspeed 0 comments
- [2206.01859] Extreme Compression for Pre-trained Transformers Made Simple and Efficient https://arxiv.org/abs/2206.01859 0 comments
- [2104.07857] ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning https://arxiv.org/abs/2104.07857 0 comments
- DeepSpeed/blogs/deepspeed-chat at master · microsoft/DeepSpeed · GitHub https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat 0 comments