[2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Linking pages

ChatGLM-6B/README_en.md at main · THUDM/ChatGLM-6B · GitHub https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md 93 comments
Accelerating Generative AI with PyTorch II: GPT, Fast | PyTorch https://pytorch.org/blog/accelerating-generative-ai-2/ 69 comments
Falcon 180B: Can It Run on Your Computer? https://kaitchup.substack.com/p/falcon-180b-can-it-run-on-your-computer 35 comments
Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic https://neuralmagic.com/blog/fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse/ 26 comments
Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
Efficient LLM inference - by Finbarr Timbers https://www.artfintel.com/p/efficient-llm-inference 11 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline | by Yang You | Mar, 2023 | Medium https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b 4 comments
Local Large Language Models - beginners guide - int8.io int8.io https://int8.io/local-large-language-models-beginners-guide/ 2 comments
LLM Ecosystem: Quantization, RAG, Agents, and More | Pinecone https://www.pinecone.io/learn/llm-ecosystem/ 2 comments
HQQ quantization https://mobiusml.github.io/hqq_blog/ 2 comments
Announcing GPTQ & GGML Quantized LLM support for Huggingface Transformers â PostgresML https://postgresml.org/blog/announcing-gptq-and-ggml-quantized-llm-support-for-huggingface-transformers 1 comment
Everything about Distributed Training and Efficient Finetuning | Sumanth's Personal Website https://sumanthrh.com/post/distributed-and-efficient-finetuning/ 1 comment
GitHub - qwopqwop200/GPTQ-for-LLaMa: 4 bits quantization of LLaMa using GPTQ https://github.com/qwopqwop200/GPTQ-for-LLaMa 0 comments
Int-4 LLaMa is not enough - Int-3 and beyond. https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and 0 comments
ColossalChat: An Open-source Solution for Cloning ChatGPT with A Complete RLHF Pipeline | Synced https://syncedreview.com/2023/03/29/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline/ 0 comments
GitHub - shm007g/LLaMA-Cult-and-More: Keeping Track of Affordable Language Models, 🦙 Cult and More https://github.com/shm007g/LLaMA-Cult-and-More 0 comments
Running LLMs in the Browser with Rust + WebGPU https://fleetwood.dev/posts/running-llms-in-the-browser 0 comments
Navigating the Complexities of LLM Quantization: Techniques, Trade-offs, and Real-World Implications https://open.substack.com/pub/tinyml/p/navigating-the-complexities-of-llm 0 comments
Navigating the Complexities of LLM Quantization: Techniques, Trade-offs, and Real-World Implications https://tinyml.substack.com/p/navigating-the-complexities-of-llm 0 comments