Linking pages
Linked pages
- Mistral 7B | Mistral AI | Open source models https://mistral.ai/news/announcing-mistral-7b/ 618 comments
- Mixtral of experts | Mistral AI | Open source models https://mistral.ai/news/mixtral-of-experts/ 300 comments
- [2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 151 comments
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- Understanding LSTM Networks -- colah's blog https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 64 comments
- But what is a convolution? - YouTube https://www.youtube.com/watch?v=KuXjwB4LzSA 24 comments
- What Every User Should Know About Mixed Precision Training in PyTorch | PyTorch https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/ 24 comments
- Coefficient of variation - Wikipedia https://en.wikipedia.org/wiki/Coefficient_of_variation 21 comments
- Directed acyclic graph - Wikipedia https://en.wikipedia.org/wiki/Directed_acyclic_graph 12 comments
- https://arxiv.org/abs/2101.03961 4 comments
- Mixture of Experts Explained https://huggingface.co/blog/moe 2 comments
- Open Release of Grok-1 https://x.ai/blog/grok-os 2 comments
- Data Parallelism VS Model Parallelism in Distributed Deep Learning Training - Lei Mao's Log Book https://leimao.github.io/blog/Data-Parallelism-vs-Model-Paralelism/ 0 comments
- [2202.08906] ST-MoE: Designing Stable and Transferable Sparse Expert Models https://arxiv.org/abs/2202.08906 0 comments
- c4 · Datasets at Hugging Face https://huggingface.co/datasets/c4 0 comments
- Decoder-Only Transformers: The Workhorse of Generative LLMs https://cameronrwolfe.substack.com/p/decoder-only-transformers-the-workhorse 0 comments
Related searches:
Search whole site: site:cameronrwolfe.substack.com
Search title: Mixture-of-Experts (MoE): The Birth and Rise of Conditional Computation
See how to search.