- [P] Don't have enough GPU to train Mixtral? Why not try LLaMA-MoE~ https://github.com/pjlab-sys4nlp/llama-moe 6 comments machinelearning
Linked pages
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- SlimPajama: A 627B token cleaned and deduplicated version of RedPajama - Cerebras https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama 7 comments
- https://arxiv.org/abs/2101.03961 4 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:github.com
Search title: GitHub - pjlab-sys4nlp/llama-moe: ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
See how to search.