- Awesome LLM/GenAI Systems Papers https://github.com/AmberLJC/LLMSys-PaperList/ 1 comment learnmachinelearning
Linked pages
- GitHub - deepseek-ai/open-infra-index https://github.com/deepseek-ai/open-infra-index 236 comments
- GitHub - ray-project/llm-numbers: Numbers every LLM developer should know https://github.com/ray-project/llm-numbers 113 comments
- [2309.07062] Large Language Models for Compiler Optimization https://arxiv.org/abs/2309.07062 112 comments
- https://chat.lmsys.org/ 51 comments
- Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard 51 comments
- The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention https://vllm.ai/ 42 comments
- GitHub - ai-dynamo/dynamo: A Datacenter Scale Distributed Inference Serving Framework https://github.com/ai-dynamo/dynamo 38 comments
- [2303.06865] High-throughput Generative Inference of Large Language Models with a Single GPU https://arxiv.org/abs/2303.06865 36 comments
- [2412.19437] DeepSeek-V3 Technical Report https://arxiv.org/abs/2412.19437 34 comments
- [2404.08801] Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length https://arxiv.org/abs/2404.08801 31 comments
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- [2310.01889] Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/abs/2310.01889 20 comments
- https://aviary.anyscale.com/ 12 comments
- [2503.05139] Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs https://arxiv.org/abs/2503.05139 9 comments
- GitHub - NVIDIA/NeMo: NeMo: a toolkit for conversational AI https://github.com/NVIDIA/NeMo 8 comments
- [2311.03285] S-LoRA: Serving Thousands of Concurrent LoRA Adapters https://arxiv.org/abs/2311.03285 8 comments
- [2104.10350] Carbon Emissions and Large Neural Network Training https://arxiv.org/abs/2104.10350 5 comments
- [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
- [2307.10169] Challenges and Applications of Large Language Models https://arxiv.org/abs/2307.10169 3 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - AmberLJC/LLMSys-PaperList: Large Language Model (LLM) Systems Paper List
See how to search.