- [R] Scaling TransNormer to 175 Billion Parameters https://arxiv.org/abs/2307.14995 22 comments machinelearning
Linking pages
- GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
- GitHub - sustcsonglin/flash-linear-attention: Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton https://github.com/sustcsonglin/flash-linear-attention 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2307.14995] Scaling TransNormer to 175 Billion Parameters
See how to search.