[2404.02258] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models - discu.eu

Hacker News

Mixture-of-Depths: Dynamically allocating compute in transformers https://arxiv.org/abs/2404.02258 83 comments 7/4/2024

Mixture-of-Depths: Dynamically allocating compute in transformer language models https://arxiv.org/abs/2404.02258 2 comments 4/4/2024

Reddit

[R] Deepmind - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models https://arxiv.org/abs/2404.02258 17 comments 4/4/2024 machinelearning

Linking pages

How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? https://magazine.sebastianraschka.com/p/how-good-are-the-latest-open-llms 1 comment

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2404.02258] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

See how to search.

Submit link to: