Hacker News
- A guide to open-source LLM inference and performance https://www.baseten.co/blog/llm-transformer-inference-guide/ 14 comments
Linked pages
- How to Do Great Work http://paulgraham.com/greatwork.html 435 comments
- GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs https://github.com/turboderp/exllamav2 125 comments
- Making Deep Learning go Brrrr From First Principles https://horace.io/brrr_intro.html 20 comments
- [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
- https://arxiv.org/pdf/2302.13971.pdf 0 comments
- Transformer Inference Arithmetic | kipply's blog https://kipp.ly/transformer-inference-arithmetic/ 0 comments
Related searches:
Search whole site: site:baseten.co
Search title: A guide to LLM inference and performance
See how to search.