[1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer - discu.eu

Hacker News

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts (2017) https://arxiv.org/abs/1701.06538 10 comments 8/12/2023

Outrageously Large Neural Nets: Sparsely-Gated Mixture-of-Experts Layer (2017) https://arxiv.org/abs/1701.06538 33 comments 8/6/2019
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer https://arxiv.org/abs/1701.06538 81 comments 30/1/2017

Linking pages

Introducing Gemini 1.5, Google's next-generation AI model https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ 715 comments
GitHub - EleutherAI/gpt-neo: An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library. https://github.com/EleutherAI/gpt-neo/ 127 comments
Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini 113 comments
Google Research: Themes from 2021 and Beyond – Google AI Blog https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html 52 comments
How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
Google Brain’s new super fast and highly accurate AI: the Mixture of Experts Layer. | by Théo Szymkowiak | Medium https://medium.com/@thoszymkowiak/google-brains-new-super-fast-and-highly-accurate-ai-the-mixture-of-experts-layer-dd3972c25663 15 comments
GitHub - AviSoori1x/makeMoE: From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :) https://github.com/AviSoori1x/makeMoE 14 comments
The Google Brain Team — Looking Back on 2017 (Part 1 of 2) – Google AI Blog https://research.googleblog.com/2018/01/the-google-brain-team-looking-back-on.html 6 comments
GitHub - pjlab-sys4nlp/llama-moe: ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training https://github.com/pjlab-sys4nlp/llama-moe 6 comments
Mixtures of Experts - Javid Lakha https://blog.javid.io/p/mixtures-of-experts 2 comments
Alpa: Automated Model-Parallel Deep Learning – Google AI Blog https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html 1 comment
Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
GitHub - amrzv/awesome-colab-notebooks: Collection of google colaboratory notebooks for fast and easy experiments https://github.com/amrzv/awesome-colab-notebooks 0 comments
Core Modeling at Instagram. At Instagram we have many Machine… | by Thomas Bredillet | Instagram Engineering https://instagram-engineering.com/core-modeling-at-instagram-a51e0158aa48 0 comments
Exploring Massively Multilingual, Massive Neural Machine Translation – Google AI Blog https://ai.googleblog.com/2019/10/exploring-massively-multilingual.html 0 comments
DeepMind’s PathNet: A Modular Deep Learning Architecture for AGI | by Carlos E. Perez | Intuition Machine | Medium https://medium.com/intuitionmachine/pathnet-a-modular-deep-learning-architecture-for-agi-5302fcf53273 0 comments
General and Scalable Parallelization for Neural Networks – Google AI Blog https://ai.googleblog.com/2021/12/general-and-scalable-parallelization.html 0 comments
Alpa: Automated Model-Parallel Deep Learning – Google AI Blog https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html?m=1 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

See how to search.

Submit link to: