Mixtures of Experts - Javid Lakha - discu.eu

Hacker News

Mixtures of Experts https://blog.javid.io/p/mixtures-of-experts 2 comments 8/10/2023

Linking pages

HN blogs - 8/10/23 - by Paul - HackerNews blogs newsletter https://open.substack.com/pub/hnblogs/p/hn-blogs-81023 0 comments

Linked pages

[1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
[1503.02531] Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531 5 comments
https://arxiv.org/abs/2101.03961 4 comments
T5 https://huggingface.co/docs/transformers/model_doc/t5 3 comments
[2308.00951] From Sparse to Soft Mixtures of Experts https://arxiv.org/abs/2308.00951 3 comments
https://arxiv.org/abs/2202.01169#deepmind 2 comments
GitHub - stanford-futuredata/megablocks https://github.com/stanford-futuredata/megablocks 1 comment
[1911.02150] Fast Transformer Decoding: One Write-Head is All You Need https://arxiv.org/abs/1911.02150 1 comment
[2203.15556] Training Compute-Optimal Large Language Models https://arxiv.org/abs/2203.15556 0 comments
[2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 0 comments
[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929 0 comments
Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors) - YouTube https://youtu.be/ccBMRryxGog 0 comments
How does GPT-3 spend its 175B parameters? - by Robert Huben https://aizi.substack.com/p/how-does-gpt-3-spend-its-175b-parameters 0 comments
[2202.08906] ST-MoE: Designing Stable and Transferable Sparse Expert Models https://arxiv.org/abs/2202.08906 0 comments
[2305.14705] Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models https://arxiv.org/abs/2305.14705 0 comments
c4 · Datasets at Hugging Face https://huggingface.co/datasets/c4 0 comments

Related searches:

Search whole site: site:blog.javid.io

Search title: Mixtures of Experts - Javid Lakha

See how to search.

Submit link to: