Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic - discu.eu

Hacker News

Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse https://neuralmagic.com/blog/fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse/ 26 comments 23/11/2023

Linking pages

2:4 Sparse Llama: Smaller Models for Efficient GPU Inference https://neuralmagic.com/blog/24-sparse-llama-smaller-models-for-efficient-gpu-inference/ 132 comments

Linked pages

[2301.00774] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot https://arxiv.org/abs/2301.00774 128 comments
[2306.03078] SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression https://arxiv.org/abs/2306.03078 2 comments
[2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
[2310.06927] Sparse Fine-tuning for Inference Acceleration of Large Language Models https://arxiv.org/abs/2310.06927 0 comments

Related searches:

Search whole site: site:neuralmagic.com

Search title: Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic

See how to search.

Submit link to: