Linking pages
- GitHub - huggingface/candle: Minimalist ML framework for Rust https://github.com/huggingface/candle 205 comments
- Introducing Triton: Open-Source GPU Programming for Neural Networks https://openai.com/blog/triton/ 116 comments
- GitHub - facebookincubator/AITemplate: AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference. https://github.com/facebookincubator/AITemplate 71 comments
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
- GitHub - mikeroyal/Unreal-Engine-Guide: Unreal Engine 5 Guide. Learn to develop games for Windows, Linux, macOS, iOS, Android, Xbox Series X|S, PlayStation 4 & 5, Nintendo Switch. https://github.com/mikeroyal/Unreal-Engine-Guide#linux-development 12 comments
- GitHub - mikeroyal/Neuromorphic-Computing-Guide: Learn about the Neumorphic engineering process of creating large-scale integration (VLSI) systems containing electronic analog circuits to mimic neuro-biological architectures. https://github.com/mikeroyal/Neuromorphic-Computing-Guide 7 comments
- GitHub - mikeroyal/Machine-Learning-Guide: Machine learning Guide. Learn all about Machine Learning Tools, Libraries, Frameworks, and Training Models. https://github.com/mikeroyal/Machine-Learning-Guide 2 comments
- Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short] https://www.thonking.ai/p/strangely-matrix-multiplications 2 comments
- GitHub - mikeroyal/Game-Console-Dev-Guide: Game Console Dev Guide. Learn to develop games for Xbox Series X|S, PlayStation 4 & 5, Nintendo Switch, Steam Deck, and Apple Silicon. https://github.com/mikeroyal/Game-Console-Dev-Guide 1 comment
- GitHub - NVlabs/tiny-cuda-nn: Lightning fast C++/CUDA neural network framework https://github.com/NVlabs/tiny-cuda-nn 0 comments
- GitHub - mikeroyal/CUDA-Guide: CUDA Guide https://github.com/mikeroyal/CUDA-Guide 0 comments
- GitHub - mikeroyal/Unity-Guide: Unity Engine Guide https://github.com/mikeroyal/Unity-Guide 0 comments
- GitHub - mikeroyal/MATLAB-Guide: MATLAB Guide https://github.com/mikeroyal/MATLAB-Guide 0 comments
- GitHub - mit-han-lab/smoothquant: [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models https://github.com/mit-han-lab/smoothquant 0 comments
- GitHub - andylolu2/simpleGEMM: The simplest but fast implementation of matrix multiplication in CUDA. https://github.com/andylolu2/simpleGEMM 0 comments
- GitHub - mikeroyal/AMX-Guide: Advanced Matrix Extensions (AMX) Guide https://github.com/mikeroyal/AMX-Guide 0 comments
- GitHub - efeslab/Nanoflow: A throughput-oriented high-performance serving framework for LLMs https://github.com/efeslab/Nanoflow 0 comments
- Efficient GEMM Kernel Designs with Pipelining | SIGARCH https://www.sigarch.org/efficient-gemm-kernel-designs-with-pipelining/ 0 comments
- FireAttention V3: Enabling AMD as a Viable Alternative for GPU Inference https://fireworks.ai/blog/fireattention-v3 0 comments
Linked pages
- NVIDIA A100 | NVIDIA https://www.nvidia.com/en-us/data-center/a100/ 280 comments
- CUDA Toolkit - Free Tools and Training | NVIDIA Developer https://developer.nvidia.com/cuda-toolkit 6 comments
- CUDA Toolkit 12.1 Downloads | NVIDIA Developer https://developer.nvidia.com/cuda-downloads 5 comments
- PTX ISA :: CUDA Toolkit Documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html 4 comments
- H100 Tensor Core GPU | NVIDIA https://www.nvidia.com/en-us/data-center/h100/ 3 comments
- NVIDIA L40 GPU for Data Center | NVIDIA https://www.nvidia.com/en-us/data-center/l40/ 1 comment
Related searches:
Search whole site: site:github.com
Search title: GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines
See how to search.