Fast and Expressive LLM Inference with RadixAttention and SGLang | LMSYS Org - discu.eu

Linking pages

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention | PyTorch https://pytorch.org/blog/flexattention/ 24 comments
Introducing SkyServe: 50% Cheaper AI Serving on Any Cloud with High Availability | SkyPilot Blog https://blog.skypilot.co/introducing-sky-serve/ 0 comments
DeepSeek R1 inference performance: MI300X vs. H200 - dstack https://dstack.ai/blog/h200-mi300x-deepskeek-benchmark/ 0 comments

Related searches:

Search whole site: site:lmsys.org

Search title: Fast and Expressive LLM Inference with RadixAttention and SGLang | LMSYS Org

See how to search.

Submit link to: