[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Linking pages

Testing AMD’s Giant MI300X – Chips and Cheese https://chipsandcheese.com/2024/06/25/testing-amds-giant-mi300x/ 343 comments
AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
What We Know About LLMs (Primer) https://willthompson.name/what-we-know-about-llms-primer 164 comments
PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever | PyTorch https://pytorch.org/blog/pytorch-2.0-release/ 122 comments
Mamba Explained | Kola Ayonrinde https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html 93 comments
Meta Open-Sources Computer Vision Foundation Model DINOv2 https://www.infoq.com/news/2023/05/meta-dinov2-vision/ 88 comments
How to make LLMs go fast https://vgel.me/posts/faster-inference/ 54 comments
Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html 44 comments
Mamba Explained https://thegradient.pub/mamba-explained/ 44 comments
GitHub - mlc-ai/web-stable-diffusion: Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support. https://github.com/mlc-ai/web-stable-diffusion 42 comments
GitHub - tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only) https://github.com/tspeterkim/flash-attention-minimal 42 comments
GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
The open source learning curve for AI researchers https://www.supervised.news/p/the-open-source-learning-curve-for 35 comments
Inside the Matrix: Visualizing Matrix Multiplication, Attention and Beyond | PyTorch https://pytorch.org/blog/inside-the-matrix/ 34 comments
Understanding Large Language Models -- A Transformative Reading List https://sebastianraschka.com/blog/2023/llm-reading-list.html 26 comments
NLP Research in the Era of LLMs - by Sebastian Ruder https://nlpnewsletter.substack.com/p/nlp-research-in-the-era-of-llms 17 comments
A guide to LLM inference and performance https://www.baseten.co/blog/llm-transformer-inference-guide/ 14 comments
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs https://magazine.sebastianraschka.com/p/understanding-and-coding-self-attention 11 comments
GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments