Linking pages
- DeepSeek's AI breakthrough bypasses Nvidia's industry-standard CUDA, uses assembly-like PTX programming instead | Tom's Hardware https://www.tomshardware.com/tech-industry/artificial-intelligence/deepseeks-ai-breakthrough-bypasses-industry-standard-cuda-uses-assembly-like-ptx-programming-instead 441 comments
- Rust CUDA project update | Rust GPU https://rust-gpu.github.io/blog/2025/03/18/rust-cuda-update/ 72 comments
- GitHub - deepseek-ai/DeepGEMM: DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling https://github.com/deepseek-ai/DeepGEMM 67 comments
- Rebooting the Rust CUDA project | Rust GPU https://rust-gpu.github.io/blog/2025/01/27/rust-cuda-reboot 51 comments
- Decorator JITs - Python as a DSL - Eli Bendersky's website https://eli.thegreenplace.net/2025/decorator-jits-python-as-a-dsl/ 44 comments
- On GPUs, ranges, latency, and superoptimisers · Paweł Dziepak https://pdziepak.github.io/2019/09/01/on-gpus-ranges-latency-and-superoptimisers/ 38 comments
- Nvidia GPU on bare metal NixOS Kubernetes cluster explained – Fang-Pen's coding note https://fangpenlin.com/posts/2025/03/01/nvidia-gpu-on-bare-metal-nixos-k8s-explained/ 15 comments
- Overview - CUDA Python 12.0.0 documentation https://nvidia.github.io/cuda-python/overview.html 11 comments
- Beating cuBLAS in Single-Precision General Matrix Multiplication https://salykova.github.io/sgemm-gpu 8 comments
- TornadoVM: Accelerating Java with GPUs and FPGAs https://www.infoq.com/articles/tornadovm-java-gpu-fpga/ 5 comments
- Benchmarking and Dissecting the Nvidia Hopper GPU Architecture https://arxiv.org/html/2402.13499v1 4 comments
- The Longest Nvidia PTX Instruction | Ash's Blog https://ashvardanian.com/posts/longest-ptx-instruction/ 3 comments
- CPP_from_1998_to_2020/Cpp-Technical-Note.md at main · burlachenkok/CPP_from_1998_to_2020 · GitHub https://github.com/burlachenkok/CPP_from_1998_to_2020/blob/main/Cpp-Technical-Note.pdf 2 comments
- rNdN: Fast Query Compilation for NVIDIA GPUs | ACM Transactions on Architecture and Code Optimization https://dl.acm.org/doi/10.1145/3603503 1 comment
- GitHub - gvilums/ptoxide: Virtual machine for executing CUDA PTX without a GPU https://github.com/gvilums/ptoxide 1 comment
- Outperforming cuBLAS on H100: a Worklog https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog 1 comment
- Modular: Democratizing AI Compute, Part 4: CUDA is the incumbent, but is it any good? https://www.modular.com/blog/democratizing-ai-compute-part-4-cuda-is-the-incumbent-but-is-it-any-good 1 comment
- GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines https://github.com/NVIDIA/cutlass 0 comments
- Level up Your Java Performance with TornadoVM https://www.infoq.com/articles/java-performance-tornadovm/ 0 comments
- XLA: Optimizing Compiler for Machine Learning | TensorFlow https://www.tensorflow.org/xla 0 comments
Related searches:
Search whole site: site:docs.nvidia.com
Search title: PTX ISA :: CUDA Toolkit Documentation
See how to search.