[1911.02150] Fast Transformer Decoding: One Write-Head is All You Need

Linking pages

How to make LLMs go fast https://vgel.me/posts/faster-inference/ 54 comments
GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments
What is Llama 2? Meta’s large language model explained | InfoWorld https://www.infoworld.com/article/3706470/what-is-llama-2-metas-large-language-model-explained.html 6 comments
Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
Mixtures of Experts - Javid Lakha https://blog.javid.io/p/mixtures-of-experts 2 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
Five years of progress in GPTs - by Finbarr Timbers https://finbarrtimbers.substack.com/p/five-years-of-progress-in-gpts 0 comments
Transformer Deep Dive: Parameter Counting https://orenleung.com/transformer-parameter-counting 0 comments
GitHub - conceptofmind/PaLM: An open-source implementation of Google's PaLM models https://github.com/conceptofmind/PaLM 0 comments
GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
Where do LLMs spend their FLOPS? - by Finbarr Timbers https://www.artfintel.com/p/where-do-llms-spend-their-flops 0 comments
INT4 Decoding GQA CUDA Optimizations for LLM Inference | PyTorch https://pytorch.org/blog/int4-decoding/ 0 comments