[2305.13245] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Linking pages

Llama 3 implemented in pure NumPy · The Missing Papers https://docs.likejazz.com/llama3.np/ 50 comments
GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
Ahead of AI #11: New Foundation Models https://magazine.sebastianraschka.com/p/ahead-of-ai-11-new-foundation-models 34 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
What is Llama 2? Meta’s large language model explained | InfoWorld https://www.infoworld.com/article/3706470/what-is-llama-2-metas-large-language-model-explained.html 6 comments
LLAMA 2: an incredible open-source LLM - by Nathan Lambert https://www.interconnects.ai/p/llama-2-from-meta 5 comments
Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
Meta Llama 3 available on Cloudflare Workers AI https://blog.cloudflare.com/meta-llama-3-available-on-cloudflare-workers-ai 1 comment
GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
You Only Cache Once: Decoder-Decoder Architectures for Language Models https://gonzoml.substack.com/p/you-only-cache-once-decoder-decoder 0 comments
INT4 Decoding GQA CUDA Optimizations for LLM Inference | PyTorch https://pytorch.org/blog/int4-decoding/ 0 comments