Hacker News
- Outperforming cuBLAS on H100: A Worklog https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog 0 comments
Linked pages
- GPUs Go Brrr · Hazy Research https://hazyresearch.stanford.edu/blog/2024-05-12-tk 267 comments
- Hilbert curve - Wikipedia https://en.wikipedia.org/wiki/Hilbert_curve 66 comments
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
- NVIDIA Hopper Architecture In-Depth | NVIDIA Technical Blog https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ 20 comments
- [2407.08608] FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision https://arxiv.org/abs/2407.08608 6 comments
- PTX ISA :: CUDA Toolkit Documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html 4 comments
- Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short] https://www.thonking.ai/p/strangely-matrix-multiplications 2 comments
- bfloat16 floating-point format - Wikipedia https://en.wikipedia.org/wiki/Bfloat16_floating-point_format 1 comment
- Dissecting the Ampere GPU Architecture through Microbenchmarking | GTC Digital April 2021 | NVIDIA On-Demand https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s33322/ 0 comments
Related searches:
Search whole site: site:cudaforfun.substack.com
Search title: Outperforming cuBLAS on H100: a Worklog
See how to search.