GitHub - AmberLJC/LLMSys-PaperList: Large Language Model (LLM) Systems Paper List - discu.eu

Reddit

Awesome LLM/GenAI Systems Papers https://github.com/AmberLJC/LLMSys-PaperList/ 1 comment 31/3/2025 learnmachinelearning

Linked pages

GitHub - deepseek-ai/open-infra-index https://github.com/deepseek-ai/open-infra-index 236 comments
GitHub - ray-project/llm-numbers: Numbers every LLM developer should know https://github.com/ray-project/llm-numbers 113 comments
[2309.07062] Large Language Models for Compiler Optimization https://arxiv.org/abs/2309.07062 112 comments
https://chat.lmsys.org/ 51 comments
Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard 51 comments
The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention https://vllm.ai/ 42 comments
GitHub - ai-dynamo/dynamo: A Datacenter Scale Distributed Inference Serving Framework https://github.com/ai-dynamo/dynamo 38 comments
[2303.06865] High-throughput Generative Inference of Large Language Models with a Single GPU https://arxiv.org/abs/2303.06865 36 comments
[2412.19437] DeepSeek-V3 Technical Report https://arxiv.org/abs/2412.19437 34 comments
[2404.08801] Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length https://arxiv.org/abs/2404.08801 31 comments
Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
[2310.01889] Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/abs/2310.01889 20 comments
https://aviary.anyscale.com/ 12 comments
[2503.05139] Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs https://arxiv.org/abs/2503.05139 9 comments
GitHub - NVIDIA/NeMo: NeMo: a toolkit for conversational AI https://github.com/NVIDIA/NeMo 8 comments
[2311.03285] S-LoRA: Serving Thousands of Concurrent LoRA Adapters https://arxiv.org/abs/2311.03285 8 comments
[2104.10350] Carbon Emissions and Large Neural Network Training https://arxiv.org/abs/2104.10350 5 comments
[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
[2307.10169] Challenges and Applications of Large Language Models https://arxiv.org/abs/2307.10169 3 comments

Related searches:

Search whole site: site:github.com

Search title: GitHub - AmberLJC/LLMSys-PaperList: Large Language Model (LLM) Systems Paper List

See how to search.

Submit link to: