Beating cuBLAS in Single-Precision General Matrix Multiplication - discu.eu

Hacker News

Beating cuBLAS in Single-Precision General Matrix Multiplication https://salykova.github.io/sgemm-gpu 8 comments 15/1/2025

Linked pages

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
GitHub - NervanaSystems/maxas: Assembler for NVIDIA Maxwell architecture https://github.com/nervanasystems/maxas 16 comments
PTX ISA :: CUDA Toolkit Documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html 4 comments
Beating OpenBLAS in FP32 Matrix Multiplication https://salykova.github.io/matmul 1 comment
GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ https://github.com/tinygrad/tinygrad 0 comments
CUDA Matrix Multiplication Optimization - Lei Mao's Log Book https://leimao.github.io/article/CUDA-Matrix-Multiplication-Optimization/ 0 comments

Related searches:

Search whole site: site:salykova.github.io

Search title: Beating cuBLAS in Single-Precision General Matrix Multiplication

See how to search.

Submit link to: