GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers - discu.eu

Hacker News

X-Transformers: A fully-featured transformer with experimental features https://github.com/lucidrains/x-transformers 37 comments 8/5/2021

Reddit

[D] unable to overfit transformer decoder model https://github.com/lucidrains/x-transformers#xval---continuous-and-discrete 3 comments 11/4/2024 machinelearning

Linking pages

Linked pages

GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters https://github.com/yandex/YaLM-100B 902 comments
GitHub - deepmind/alphafold: Open source code for AlphaFold. https://github.com/deepmind/alphafold 315 comments
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance – Google AI Blog https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html 279 comments
Introducing LLaMA: A foundational, 65-billion-parameter language model https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ 204 comments
Releasing Persimmon-8B https://www.adept.ai/blog/persimmon-8b 56 comments
[2112.05682] Self-attention Does Not Need $O(n^2)$ Memory https://arxiv.org/abs/2112.05682 37 comments
[2212.14034] Cramming: Training a Language Model on a Single GPU in One Day https://arxiv.org/abs/2212.14034 25 comments
[2307.14995] Scaling TransNormer to 175 Billion Parameters https://arxiv.org/abs/2307.14995 22 comments
[2109.08668] Primer: Searching for Efficient Transformers for Language Modeling https://arxiv.org/abs/2109.08668 18 comments
[2309.16588] Vision Transformers Need Registers https://arxiv.org/abs/2309.16588 9 comments
Improving language models by retrieving from trillions of tokens https://deepmind.com/research/publications/2021/improving-language-models-by-retrieving-from-trillions-of-tokens 6 comments
[2105.13290] CogView: Mastering Text-to-Image Generation via Transformers https://arxiv.org/abs/2105.13290 6 comments
bigscience/bloom · Hugging Face https://huggingface.co/bigscience/bloom 4 comments
[2305.19466] The Impact of Positional Encoding on Length Generalization in Transformers https://arxiv.org/abs/2305.19466 4 comments
[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
GitHub - HazyResearch/flash-attention: Fast and memory-efficient exact attention https://github.com/HazyResearch/flash-attention 3 comments
[2305.19268] Intriguing Properties of Quantization at Scale https://arxiv.org/abs/2305.19268 2 comments
[1911.02150] Fast Transformer Decoding: One Write-Head is All You Need https://arxiv.org/abs/1911.02150 1 comment
[1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer https://arxiv.org/abs/1910.10683 1 comment
[2204.02311] PaLM: Scaling Language Modeling with Pathways https://arxiv.org/abs/2204.02311 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:github.com

Search title: GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers

See how to search.

Submit link to: