Hacker News
- A Visual Guide to LLM Quantization https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization 18 comments
- [P] A Visual Guide to Quantization https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization 10 comments machinelearning
Linked pages
- [2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
- GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs https://github.com/turboderp/exllamav2 125 comments
- VRAM Calculator https://vram.asmirnov.xyz/ 38 comments
- [2208.07339] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale https://arxiv.org/abs/2208.07339 33 comments
- IEEE 754 - Wikipedia https://en.wikipedia.org/wiki/IEEE_754 21 comments
- [2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models https://arxiv.org/abs/2310.11453 21 comments
- Transformer Math 101 | EleutherAI Blog https://blog.eleuther.ai/transformer-math/ 13 comments
- k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp · GitHub https://github.com/ggerganov/llama.cpp/pull/1684 3 comments
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes https://huggingface.co/blog/hf-bitsandbytes-integration 0 comments
- https://github.com/ggerganov/ggml/blob/master/docs/gguf.md 0 comments
- Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval https://huggingface.co/blog/embedding-quantization 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:newsletter.maartengrootendorst.com
Search title: A Visual Guide to Quantization - by Maarten Grootendorst
See how to search.