k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp · GitHub - discu.eu

Hacker News

LLM quantization severely damages model quality and perplexity https://github.com/ggerganov/llama.cpp/pull/1684 3 comments 20/10/2023

Linking pages

GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ https://github.com/ggerganov/llama.cpp 286 comments
How to make LLMs go fast https://vgel.me/posts/faster-inference/ 54 comments
A Visual Guide to Quantization - by Maarten Grootendorst https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization 29 comments
GPTQ vs GGML vs Base Models: A Quick Speed and VRAM Test for Vicuna-33B on 2x A100 80GB SXM · GPU Utils ⚡️ https://gpus.llm-utils.org/gptq-vs-ggml-vs-base-models/ 0 comments
How do I create a GGUF model file? https://www.secondstate.io/articles/convert-pytorch-to-gguf/ 0 comments

Related searches:

Search whole site: site:github.com

Search title: k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp · GitHub

See how to search.

Submit link to: