[2307.14995] Scaling TransNormer to 175 Billion Parameters - discu.eu

Reddit

[R] Scaling TransNormer to 175 Billion Parameters https://arxiv.org/abs/2307.14995 22 comments 28/7/2023 machinelearning

Linking pages

GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
GitHub - sustcsonglin/flash-linear-attention: Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton https://github.com/sustcsonglin/flash-linear-attention 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2307.14995] Scaling TransNormer to 175 Billion Parameters

See how to search.

Submit link to: