Hacker News
- Mixture-of-Depths: Dynamically allocating compute in transformers https://arxiv.org/abs/2404.02258 83 comments
- Mixture-of-Depths: Dynamically allocating compute in transformer language models https://arxiv.org/abs/2404.02258 2 comments
- [R] Deepmind - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models https://arxiv.org/abs/2404.02258 17 comments machinelearning
Linking pages
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2404.02258] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
See how to search.