- [R] Scaling Vision with Sparse Mixture of Experts https://arxiv.org/abs/2106.05974 2 comments machinelearning
Linking pages
- Google Research: Themes from 2021 and Beyond – Google AI Blog https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html 52 comments
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- GitHub - cmhungsteve/Awesome-Transformer-Attention: An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites https://github.com/cmhungsteve/Awesome-Transformer-Attention 13 comments
- Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
- General and Scalable Parallelization for Neural Networks – Google AI Blog https://ai.googleblog.com/2021/12/general-and-scalable-parallelization.html 0 comments
- Mixture-of-Experts with Expert Choice Routing – Google AI Blog https://ai.googleblog.com/2022/11/mixture-of-experts-with-expert-choice.html 0 comments
- Mixture-of-Experts with Expert Choice Routing – Google Research Blog https://blog.research.google/2022/11/mixture-of-experts-with-expert-choice.html 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2106.05974] Scaling Vision with Sparse Mixture of Experts
See how to search.