- [R] "Unified Scaling Laws for Routed Language Models", Clark et al 2022 Deepmind (detailed MoE scaling analysis; MoE advantage currently disappears at ~900b dense-parameters) https://arxiv.org/abs/2202.01169#deepmind 2 comments machinelearning
Linking pages
- GPT-4 Is Coming Soon. Here’s What We Know About It | by Alberto Romero | Towards Data Science https://towardsdatascience.com/gpt-4-is-coming-soon-heres-what-we-know-about-it-64db058cfd45?gi=6b5ffbdc901c 20 comments
- Code Interpreter == GPT 4.5 (w/ Simon Willison & Alex Volkov) https://www.latent.space/p/code-interpreter 4 comments
- Mixtures of Experts - Javid Lakha https://blog.javid.io/p/mixtures-of-experts 2 comments
- Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
- GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.