- [R] New Paper on Mixture of Experts (MoE) 🚀 https://github.com/arpita8/Awesome-Mixture-of-Experts-Papers 17 comments machinelearning
Linked pages
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- [2310.06825] Mistral 7B https://arxiv.org/abs/2310.06825 124 comments
- [2401.04081] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts https://arxiv.org/abs/2401.04081 39 comments
- [2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
- [2310.16795] QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models https://arxiv.org/abs/2310.16795 13 comments
- [2403.18814] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models https://arxiv.org/abs/2403.18814 7 comments
- https://arxiv.org/abs/2111.12763 5 comments
- [2110.03888] M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining https://arxiv.org/abs/2110.03888 3 comments
- [2106.05974] Scaling Vision with Sparse Mixture of Experts https://arxiv.org/abs/2106.05974 2 comments
- [2112.06905] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts https://arxiv.org/abs/2112.06905 1 comment
- [2211.01324] eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers https://arxiv.org/abs/2211.01324 1 comment
- https://dl.acm.org/doi/abs/10.1145/3503221.3508417 0 comments
- [2202.08906] ST-MoE: Designing Stable and Transferable Sparse Expert Models https://arxiv.org/abs/2202.08906 0 comments
- [2305.14705] Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models https://arxiv.org/abs/2305.14705 0 comments
- [2312.09979] LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment https://arxiv.org/abs/2312.09979 0 comments
- [2401.06066] DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models https://arxiv.org/abs/2401.06066 0 comments
- [2401.15947] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models https://arxiv.org/abs/2401.15947 0 comments
- [2404.07413] JetMoE: Reaching Llama2 Performance with 0.1M Dollars https://arxiv.org/abs/2404.07413 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.