GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention

Linking pages

GitHub - deepseek-ai/FlashMLA https://github.com/deepseek-ai/FlashMLA 108 comments
My AI Timelines Have Sped Up (Again) https://www.alexirpan.com/2024/01/10/ai-timelines-2024.html 95 comments
GitHub - jzhang38/TinyLlama https://github.com/jzhang38/TinyLlama 60 comments
GitHub - 01-ai/Yi: A series of large language models trained from scratch by developers @01-ai https://github.com/01-ai/Yi 52 comments
GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud. https://github.com/QwenLM/Qwen 51 comments
GitHub - tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only) https://github.com/tspeterkim/flash-attention-minimal 41 comments
GitHub - THUDM/LongWriter: LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs https://github.com/THUDM/LongWriter 29 comments
GitHub - linkedin/Liger-Kernel: Efficient Triton Kernels for LLM Training https://github.com/linkedin/Liger-Kernel 19 comments
Medical large language models are vulnerable to data-poisoning attacks | Nature Medicine https://www.nature.com/articles/s41591-024-03445-1 7 comments
Llemma: An Open Language Model For Mathematics | EleutherAI Blog https://blog.eleuther.ai/llemma/ 6 comments
GitHub - pjlab-sys4nlp/llama-moe: ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training https://github.com/pjlab-sys4nlp/llama-moe 6 comments
distributed-training-guide/06-training-llama-405b at main · LambdaLabsML/distributed-training-guide · GitHub https://github.com/LambdaLabsML/distributed-training-guide/tree/main/06-training-llama-405b 4 comments
GitHub - Alpha-VLLM/LLaMA2-Accessory: An Open-source Toolkit for LLM Development https://github.com/Alpha-VLLM/LLaMA2-Accessory 3 comments
Fast Video Generation with Sliding Tile Attention | Hao AI Lab @ UCSD https://hao-ai-lab.github.io/blogs/sta/ 2 comments
GitHub - NLPOptimize/flash-tokenizer: EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING https://github.com/NLPOptimize/flash-tokenizer 2 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
GitHub - QwenLM/Qwen-7B: The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. https://github.com/QwenLM/Qwen-7B 1 comment
ALiBi FlashAttention - Speeding up ALiBi by 3-5x with a hardware-efficient implementation | Princeton Language and Intelligence https://pli.princeton.edu/blog/2024/alibi-flashattention-speeding-alibi-3-5x-hardware-efficient-implementation 1 comment
GitHub - GreenBitAI/green-bit-llm: A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs. https://github.com/GreenBitAI/green-bit-llm 1 comment
GitHub - QwenLM/Qwen2.5-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud. https://github.com/QwenLM/Qwen2.5-VL 1 comment

Linking pages

Linked pages