[1904.10509] Generating Long Sequences with Sparse Transformers

Linking pages

Mistral 7B | Mistral AI | Open source models https://mistral.ai/news/announcing-mistral-7b/ 618 comments
Jukebox https://openai.com/blog/jukebox/ 130 comments
How GPT3 Works - Visualizations and Animations – Jay Alammar – Visualizing machine learning one concept at a time. https://jalammar.github.io/how-gpt3-works-visualizations-animations/ 109 comments
The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
Generating music in the waveform domain – Sander Dieleman https://sander.ai/2020/03/24/audio-generation.html 41 comments
10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
Generative Modeling with Sparse Transformers https://openai.com/blog/sparse-transformer/ 9 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
GitHub - amrzv/awesome-colab-notebooks: Collection of google colaboratory notebooks for fast and easy experiments https://github.com/amrzv/awesome-colab-notebooks 0 comments
NLP Newsletter #10 [EN]: Improving Reproducibility in ML, Privacy and Security in NLP, XTREME, Longformer, VilBERT, exBERT,… – DAIR.AI https://dair.ai/NLP_Newsletter_10_en/ 0 comments
Generating music in the waveform domain – Sander Dieleman https://benanne.github.io/2020/03/24/audio-generation.html 0 comments
A Survey of Long-Term Context in Transformers https://www.pragmatic.ml/a-survey-of-methods-for-incorporating-long-term-context/ 0 comments
OpenAI and the road to text-guided image generation: DALL·E, CLIP, GLIDE, DALL·E 2 (unCLIP) | by Grigory Sapunov | Intento https://blog.inten.to/openai-and-the-road-to-text-guided-image-generation-dall-e-clip-glide-dall-e-2-unclip-c6e28f7194ea?gi=53c11ab07fab 0 comments
GPT-3: Language Models are Few-Shot Learners | by Grigory Sapunov | Intento https://blog.inten.to/gpt-3-language-models-are-few-shot-learners-a13d1ae8b1f9 0 comments
Speeding up BERT. How to make BERT models faster | by Grigory Sapunov | Intento https://blog.inten.to/speeding-up-bert-5528e18bb4ea 0 comments
GitHub - tomohideshibata/BERT-related-papers: BERT-related papers https://github.com/tomohideshibata/BERT-related-papers 0 comments
OpenAI Sparse Transformer Improves Predictable Sequence Length by 30x | by Synced | SyncedReview | Medium https://medium.com/syncedreview/openai-sparse-transformer-improves-predictable-sequence-length-by-30x-5a65ef2592b9 0 comments
Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
How does GPT-3 spend its 175B parameters? - by Robert Huben https://aizi.substack.com/p/how-does-gpt-3-spend-its-175b-parameters 0 comments
Ahead of AI #12: LLM Businesses and Busyness https://magazine.sebastianraschka.com/p/ahead-of-ai-12-llm-businesses 0 comments