Hacker News
- Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 https://github.com/arekpaterek/Faster_SGEMM_CUDA 5 comments
Linked pages
Related searches:
Search whole site: site:github.com
Search title: GitHub - arekpaterek/Faster_SGEMM_CUDA: FP32 matrix multiplication of large square matrices in some cases faster than cuBLAS.
See how to search.