[2202.08906] ST-MoE: Designing Stable and Transferable Sparse Expert Models - discu.eu

Linking pages

GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments
Mixtures of Experts - Javid Lakha https://blog.javid.io/p/mixtures-of-experts 2 comments
Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
LIMoE: Learning Multiple Modalities with One Sparse Mixture-of-Experts Model – Google AI Blog https://ai.googleblog.com/2022/06/limoe-learning-multiple-modalities-with.html 0 comments
UL2 20B: An Open Source Unified Language Learner – Google AI Blog https://ai.googleblog.com/2022/10/ul2-20b-open-source-unified-language.html 0 comments
Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
GitHub - Mooler0410/LLMsPracticalGuide: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) https://github.com/Mooler0410/LLMsPracticalGuide 0 comments
GitHub - XueFuzhao/OpenMoE: A family of open-sourced Mixture-of-Experts (MoE) Large Language Models https://github.com/XueFuzhao/OpenMoE 0 comments
Mixture-of-Experts (MoE): The Birth and Rise of Conditional Computation https://cameronrwolfe.substack.com/p/conditional-computation-the-birth 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2202.08906] ST-MoE: Designing Stable and Transferable Sparse Expert Models

See how to search.

Submit link to: