Hacker News
- LLM quantization severely damages model quality and perplexity https://github.com/ggerganov/llama.cpp/pull/1684 3 comments
Linking pages
- GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ https://github.com/ggerganov/llama.cpp 286 comments
- How to make LLMs go fast https://vgel.me/posts/faster-inference/ 54 comments
- A Visual Guide to Quantization - by Maarten Grootendorst https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization 29 comments
- GPTQ vs GGML vs Base Models: A Quick Speed and VRAM Test for Vicuna-33B on 2x A100 80GB SXM · GPU Utils ⚡️ https://gpus.llm-utils.org/gptq-vs-ggml-vs-base-models/ 0 comments
- How do I create a GGUF model file? https://www.secondstate.io/articles/convert-pytorch-to-gguf/ 0 comments
Related searches:
Search whole site: site:github.com
Search title: k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp · GitHub
See how to search.