Linking pages
- GitHub - huggingface/candle: Minimalist ML framework for Rust https://github.com/huggingface/candle 205 comments
- Introducing Triton: Open-Source GPU Programming for Neural Networks https://openai.com/blog/triton/ 116 comments
- GitHub - facebookincubator/AITemplate: AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference. https://github.com/facebookincubator/AITemplate 71 comments
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
- DeepSeek-V3 Technical Report https://arxiv.org/html/2412.19437v1 42 comments
- GitHub - mikeroyal/Unreal-Engine-Guide: Unreal Engine 5 Guide. Learn to develop games for Windows, Linux, macOS, iOS, Android, Xbox Series X|S, PlayStation 4 & 5, Nintendo Switch. https://github.com/mikeroyal/Unreal-Engine-Guide#linux-development 12 comments
- GitHub - mikeroyal/Neuromorphic-Computing-Guide: Learn about the Neumorphic engineering process of creating large-scale integration (VLSI) systems containing electronic analog circuits to mimic neuro-biological architectures. https://github.com/mikeroyal/Neuromorphic-Computing-Guide 7 comments
- The Longest Nvidia PTX Instruction | Ash's Blog https://ashvardanian.com/posts/longest-ptx-instruction/ 3 comments
- GitHub - mikeroyal/Machine-Learning-Guide: Machine learning Guide. Learn all about Machine Learning Tools, Libraries, Frameworks, and Training Models. https://github.com/mikeroyal/Machine-Learning-Guide 2 comments
- Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short] https://www.thonking.ai/p/strangely-matrix-multiplications 2 comments
- GitHub - mikeroyal/Game-Console-Dev-Guide: Game Console Dev Guide. Learn to develop games for Xbox Series X|S, PlayStation 4 & 5, Nintendo Switch, Steam Deck, and Apple Silicon. https://github.com/mikeroyal/Game-Console-Dev-Guide 1 comment
- Modular: Democratizing AI Compute, Part 4: CUDA is the incumbent, but is it any good? https://www.modular.com/blog/democratizing-ai-compute-part-4-cuda-is-the-incumbent-but-is-it-any-good 1 comment
- GitHub - NVlabs/tiny-cuda-nn: Lightning fast C++/CUDA neural network framework https://github.com/NVlabs/tiny-cuda-nn 0 comments
- GitHub - mikeroyal/CUDA-Guide: CUDA Guide https://github.com/mikeroyal/CUDA-Guide 0 comments
- GitHub - mikeroyal/Unity-Guide: Unity Engine Guide https://github.com/mikeroyal/Unity-Guide 0 comments
- GitHub - mikeroyal/MATLAB-Guide: MATLAB Guide https://github.com/mikeroyal/MATLAB-Guide 0 comments
- GitHub - mit-han-lab/smoothquant: [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models https://github.com/mit-han-lab/smoothquant 0 comments
- GitHub - andylolu2/simpleGEMM: The simplest but fast implementation of matrix multiplication in CUDA. https://github.com/andylolu2/simpleGEMM 0 comments
- GitHub - mikeroyal/AMX-Guide: Advanced Matrix Extensions (AMX) Guide https://github.com/mikeroyal/AMX-Guide 0 comments
- GitHub - efeslab/Nanoflow: A throughput-oriented high-performance serving framework for LLMs https://github.com/efeslab/Nanoflow 0 comments
Linked pages
- NVIDIA A100 | NVIDIA https://www.nvidia.com/en-us/data-center/a100/ 280 comments
- CUDA Toolkit - Free Tools and Training | NVIDIA Developer https://developer.nvidia.com/cuda-toolkit 6 comments
- CUDA Toolkit 12.1 Downloads | NVIDIA Developer https://developer.nvidia.com/cuda-downloads 5 comments
- PTX ISA :: CUDA Toolkit Documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html 4 comments
- H100 Tensor Core GPU | NVIDIA https://www.nvidia.com/en-us/data-center/h100/ 3 comments
- NVIDIA L40 GPU for Data Center | NVIDIA https://www.nvidia.com/en-us/data-center/l40/ 1 comment
Related searches:
Search whole site: site:github.com
Search title: GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines
See how to search.