Hacker News
- Outrageously Large Neural Nets: Sparsely-Gated Mixture-of-Experts Layer (2017) https://arxiv.org/abs/1701.06538 33 comments
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer https://arxiv.org/abs/1701.06538 81 comments
Linking pages
- GitHub - EleutherAI/gpt-neo: An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library. https://github.com/EleutherAI/gpt-neo/ 127 comments
- Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini 113 comments
- Google Research: Themes from 2021 and Beyond – Google AI Blog https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html 52 comments
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
- Google Brain’s new super fast and highly accurate AI: the Mixture of Experts Layer. | by Théo Szymkowiak | Medium https://medium.com/@thoszymkowiak/google-brains-new-super-fast-and-highly-accurate-ai-the-mixture-of-experts-layer-dd3972c25663 15 comments
- The Google Brain Team — Looking Back on 2017 (Part 1 of 2) – Google AI Blog https://research.googleblog.com/2018/01/the-google-brain-team-looking-back-on.html 6 comments
- Mixtures of Experts - Javid Lakha https://blog.javid.io/p/mixtures-of-experts 2 comments
- Alpa: Automated Model-Parallel Deep Learning – Google AI Blog https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html 1 comment
- Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
- GitHub - amrzv/awesome-colab-notebooks: Collection of google colaboratory notebooks for fast and easy experiments https://github.com/amrzv/awesome-colab-notebooks 0 comments
- Core Modeling at Instagram. At Instagram we have many Machine… | by Thomas Bredillet | Instagram Engineering https://instagram-engineering.com/core-modeling-at-instagram-a51e0158aa48 0 comments
- Exploring Massively Multilingual, Massive Neural Machine Translation – Google AI Blog https://ai.googleblog.com/2019/10/exploring-massively-multilingual.html 0 comments
- DeepMind’s PathNet: A Modular Deep Learning Architecture for AGI | by Carlos E. Perez | Intuition Machine | Medium https://medium.com/intuitionmachine/pathnet-a-modular-deep-learning-architecture-for-agi-5302fcf53273 0 comments
- General and Scalable Parallelization for Neural Networks – Google AI Blog https://ai.googleblog.com/2021/12/general-and-scalable-parallelization.html 0 comments
- Alpa: Automated Model-Parallel Deep Learning – Google AI Blog https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html?m=1 0 comments
- Accelerating Deep Learning Research with the Tensor2Tensor Library – Google AI Blog https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html 0 comments
- GitHub - adeshpande3/Machine-Learning-Links-And-Lessons-Learned: List of all the lessons learned, best practices, and links from my time studying machine learning https://github.com/adeshpande3/Machine-Learning-Links-And-Lessons-Learned 0 comments
- Mixture of Variational Autoencoders — a Fusion Between MoE and VAE | by Yoel Zeldes | Towards Data Science https://towardsdatascience.com/mixture-of-variational-autoencoders-a-fusion-between-moe-and-vae-22c0901a6675 0 comments
- The Google Brain Team — Looking Back on 2017 (Part 1 of 2) – Google AI Blog https://research.googleblog.com/2018/01/the-google-brain-team-looking-back-on.html?m=1 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
See how to search.