Hacker News
- How to Optimize a CUDA Matmul Kernel for CuBLAS-Like Performance: A Worklog https://siboehm.com/articles/22/CUDA-MMM 16 comments
Linking pages
- How to make LLMs go fast https://vgel.me/posts/faster-inference/ 54 comments
- Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken 3 comments
- Burn - Deep Learning Framework https://burn.dev/blog/autotune-for-gpu-kernels 1 comment
Linked pages
- https://godbolt.org 766 comments
- Computers can be understood - Made of Bugs https://blog.nelhage.com/post/computers-can-be-understood/ 83 comments
- Excalidraw | Hand-drawn look & feel • Collaborative • Secure https://excalidraw.com/ 80 comments
- [1804.06826] Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking https://arxiv.org/abs/1804.06826 32 comments
- https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf 27 comments
- PyTorch internals : ezyang’s blog http://blog.ezyang.com/2019/05/pytorch-internals/ 10 comments
- GitHub - openai/triton: Development repository for the Triton language and compiler https://github.com/openai/triton 5 comments
- GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines https://github.com/NVIDIA/cutlass 0 comments
Related searches:
Search whole site: site:siboehm.com
Search title: How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
See how to search.