discu
Newsletters
Mentions
Extension
Pricing
Login
Sign Up
Reddit
[R] "Unified Scaling Laws for Routed Language Models", Clark et al 2022 Deepmind (detailed MoE scaling analysis; MoE advantage currently disappears at ~900b dense-parameters)
https://arxiv.org/abs/2202.01169#deepmind
2 comments
4/2/2022
machinelearning