Linking pages
- Towards 1-bit Machine Learning Models https://mobiusml.github.io/1bit_blog/ 157 comments
- Introducing Aana SDK https://mobiusml.github.io/aana-sdk-introducing-blog/ 1 comment
- Fast Inference of Mixture-of-Experts Language Models with Offloading https://browse.arxiv.org/html/2312.17238v1 0 comments
- Faster and Smaller Whisper: A Deep Dive into Quantization and Torch Compilation https://mobiusml.github.io/whisper-static-cache-blog/ 0 comments
Linked pages
- Llama 2 - Meta AI https://ai.meta.com/llama/ 820 comments
- ImageNet http://image-net.org/index 12 comments
- A Gentle Introduction to torch.autograd — PyTorch Tutorials 1.13.1+cu117 documentation https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html 6 comments
- [2306.00978] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration https://arxiv.org/abs/2306.00978 2 comments
- [2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929 0 comments
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch https://github.com/TimDettmers/bitsandbytes 0 comments
- GitHub - PanQiWei/AutoGPTQ: An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm. https://github.com/PanQiWei/AutoGPTQ 0 comments
- GitHub - mit-han-lab/llm-awq: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration https://github.com/mit-han-lab/llm-awq 0 comments
Related searches:
Search whole site: site:mobiusml.github.io
Search title: HQQ quantization
See how to search.