Linking pages
- Serve with vLLM - Outlines 〰️ https://outlines-dev.github.io/outlines/reference/vllm/ 3 comments
- Beat GPT-4o at Python by searching with 100 dumb LLaMAs | Modal Blog https://modal.com/blog/llama-human-eval 2 comments
- Unbowed, Unbent, Unbroken – Decoder Only https://decoderonlyblog.wordpress.com/2024/04/19/unbowed-unbent-unbroken/ 0 comments
Linked pages
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention https://vllm.ai/ 42 comments
- How continuous batching enables 23x throughput in LLM inference while reducing p50 latency | Anyscale https://www.anyscale.com/blog/continuous-batching-llm-inference 20 comments
- [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention https://arxiv.org/abs/2309.06180 16 comments
- [2306.00978] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration https://arxiv.org/abs/2306.00978 2 comments
- [2306.07629] SqueezeLLM: Dense-and-Sparse Quantization https://arxiv.org/abs/2306.07629 1 comment
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs https://github.com/vllm-project/vllm 0 comments
Related searches:
Search whole site: site:docs.vllm.ai
Search title: Welcome to vLLM! — vLLM
See how to search.