GitHub - arekpaterek/Faster_SGEMM_CUDA: FP32 matrix multiplication of large square matrices in some cases faster than cuBLAS. - discu.eu

Hacker News

Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 https://github.com/arekpaterek/Faster_SGEMM_CUDA 5 comments 31/7/2024

Linked pages

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
CUDA GPUs - Compute Capability | NVIDIA Developer https://developer.nvidia.com/cuda-gpus#compute 22 comments

Related searches:

Search whole site: site:github.com

Search title: GitHub - arekpaterek/Faster_SGEMM_CUDA: FP32 matrix multiplication of large square matrices in some cases faster than cuBLAS.

See how to search.

Submit link to: