- [R] Mega: Moving Average Equipped Gated Attention. By using LSTM-style gates, Mega outperforms Transformer and S4 over Long Range Area, NMT, ImageNet, Wikitext-103 and raw speech classification. https://arxiv.org/abs/2209.10655 14 comments machinelearning
Linking pages
- GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/transformers 26 comments
- Transformers Revolutionized AI. What Will Replace Them? https://www.forbes.com/sites/robtoews/2023/09/03/transformers-revolutionized-ai-what-will-replace-them/ 1 comment
- GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2209.10655] Mega: Moving Average Equipped Gated Attention
See how to search.