Hacker News
- LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale (2022) https://arxiv.org/abs/2208.07339 23 comments
- LLM.bit8 - Quantization via Matrices to cut inference memory in half https://arxiv.org/abs/2208.07339 8 comments machinelearning
Linking pages
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- Local Large Language Models - beginners guide - int8.io int8.io https://int8.io/local-large-language-models-beginners-guide/ 2 comments
- Everything about Distributed Training and Efficient Finetuning | Sumanth's Personal Website https://sumanthrh.com/post/distributed-and-efficient-finetuning/ 1 comment
- Navigating the Complexities of LLM Quantization: Techniques, Trade-offs, and Real-World Implications https://open.substack.com/pub/tinyml/p/navigating-the-complexities-of-llm 0 comments
- Navigating the Complexities of LLM Quantization: Techniques, Trade-offs, and Real-World Implications https://tinyml.substack.com/p/navigating-the-complexities-of-llm 0 comments
- Accelerating Large Language Models with Mixed-Precision Techniques - Lightning AI https://lightning.ai/pages/community/tutorial/accelerating-large-language-models-with-mixed-precision-techniques/ 0 comments
- Accelerating Large Language Models with Mixed-Precision Techniques https://sebastianraschka.com/blog/2023/llm-mixed-precision.html 0 comments
- GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch https://github.com/TimDettmers/bitsandbytes 0 comments
- GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2208.07339] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
See how to search.