Hacker News
- Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse https://neuralmagic.com/blog/fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse/ 26 comments
Linking pages
Linked pages
- [2301.00774] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot https://arxiv.org/abs/2301.00774 128 comments
- [2306.03078] SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression https://arxiv.org/abs/2306.03078 2 comments
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- [2310.06927] Sparse Fine-tuning for Inference Acceleration of Large Language Models https://arxiv.org/abs/2310.06927 0 comments
Related searches:
Search whole site: site:neuralmagic.com
Search title: Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic
See how to search.