A guide to LLM inference and performance | Baseten Blog - discu.eu

Hacker News

A guide to open-source LLM inference and performance https://www.baseten.co/blog/llm-transformer-inference-guide/ 14 comments 20/11/2023

Linking pages

Linked pages

How to Do Great Work http://paulgraham.com/greatwork.html 437 comments
GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs https://github.com/turboderp/exllamav2 125 comments
Making Deep Learning go Brrrr From First Principles https://horace.io/brrr_intro.html 20 comments
[2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
https://arxiv.org/pdf/2302.13971.pdf 0 comments
Transformer Inference Arithmetic | kipply's blog https://kipp.ly/transformer-inference-arithmetic/ 0 comments

Related searches:

Search whole site: site:baseten.co

Search title: A guide to LLM inference and performance | Baseten Blog

See how to search.

Submit link to: