[2306.00978] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - discu.eu

Hacker News

Activation-Aware Weight Quantization for LLM Compression Outperforms GPTQ https://arxiv.org/abs/2306.00978 2 comments 2/6/2023

Linking pages

HQQ quantization https://mobiusml.github.io/hqq_blog/ 2 comments
Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels https://mobiusml.github.io/gemlite_blogpost/ 2 comments
GitHub - mit-han-lab/llm-awq: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration https://github.com/mit-han-lab/llm-awq 0 comments
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs https://github.com/vllm-project/vllm 0 comments
GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
GitHub - horseee/Awesome-Efficient-LLM: A curated list for Efficient Large Language Models https://github.com/horseee/Awesome-Efficient-LLM 0 comments
GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
The Path to Achieve Ultra-Low Inference Latency With LLaMA 65B on PyTorch/XLA | PyTorch https://pytorch.org/blog/path-achieve-low-inference-latency/ 0 comments
Welcome to vLLM! — vLLM https://docs.vllm.ai/en/latest/ 0 comments
LLMs for your iPhone: Whole-Tensor 4 Bit Quantization https://stephenpanaro.com/blog/llm-quantization-for-iphone 0 comments
Selecting GPUs for LLM serving on GKE | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/selecting-gpus-for-llm-serving-on-gke/ 0 comments
GitHub - NexaAI/Awesome-LLMs-on-device: Awesome LLMs on Device: A Comprehensive Survey https://github.com/NexaAI/Awesome-LLMs-on-device 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2306.00978] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

See how to search.

Submit link to: