A Survey of Long-Term Context in Transformers

Linking pages

Transformers are Graph Neural Networks https://thegradient.pub/transformers-are-graph-neural-networks/ 25 comments
GitHub - amitness/learning: A log of things I'm learning https://github.com/amitness/learning 17 comments
GPT-3 and A Typology of Hype - by Delip Rao https://pagestlabs.substack.com/p/gpt-3-and-a-typology-of-hype 0 comments
Training Google’s Reformer - takeaways, code, and weights | Svilen Todorov https://svilentodorov.xyz/blog/reformer-99m/ 0 comments
Aman's AI Journal • Primers • Transformers https://aman.ai/primers/ai/transformers/ 0 comments

Linked pages

Free eBooks | Project Gutenberg https://gutenberg.org 2028 comments
GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/transformers 26 comments
A new model and dataset for long-range memory https://deepmind.com/blog/article/A_new_model_and_dataset_for_long-range_memory 13 comments
Making Transformer networks simpler and more efficient https://ai.facebook.com/blog/making-transformer-networks-simpler-and-more-efficient/ 1 comment
trax/trax/models/reformer at master · google/trax · GitHub https://github.com/google/trax/tree/master/trax/models/reformer 1 comment
[1904.10509] Generating Long Sequences with Sparse Transformers https://arxiv.org/abs/1904.10509 1 comment
[2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 0 comments
[2007.14062] Big Bird: Transformers for Longer Sequences https://arxiv.org/abs/2007.14062 0 comments
[2001.04451] Reformer: The Efficient Transformer https://arxiv.org/abs/2001.04451 0 comments
[1906.04341] What Does BERT Look At? An Analysis of BERT's Attention https://arxiv.org/abs/1906.04341 0 comments
Google's Natural Questions https://ai.google.com/research/NaturalQuestions 0 comments
[2006.16236] Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention https://arxiv.org/abs/2006.16236 0 comments
[2006.04768] Linformer: Self-Attention with Linear Complexity https://arxiv.org/abs/2006.04768 0 comments