Hacker News
- Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels https://mobiusml.github.io/gemlite_blogpost/ 2 comments
Linking pages
Linked pages
- Accelerating Generative AI with PyTorch II: GPT, Fast | PyTorch https://pytorch.org/blog/accelerating-generative-ai-2/ 69 comments
- [2306.00978] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration https://arxiv.org/abs/2306.00978 2 comments
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- GitHub - mobiusml/gemlite: Simple and fast low-bit matmul kernels in CUDA https://github.com/mobiusml/gemlite 0 comments
Related searches:
Search whole site: site:mobiusml.github.io
Search title: Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels
See how to search.