Hacker News
- Large Transformer Model Inference Optimization https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
Linking pages
- The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
- CNN vs. Vision Transformer: A Practitioner’s Guide to Selecting the Right Model | Tobias’ blog https://tobiasvanderwerff.github.io/2024/05/15/cnn-vs-vit.html 9 comments
- GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments
- Foundation Models: The future (still) isn't happening fast enough https://www.madrona.com/foundation-models/ 1 comment
- Speeding up the GPT - KV cache | Becoming The Unbeatable https://immortal3.github.io/becoming-the-unbeatable/posts/gpt-kvcache/ 0 comments
- How is LLaMa.cpp possible? - by Finbarr Timbers https://finbarrtimbers.substack.com/p/how-is-llamacpp-possible 0 comments
- Some Math behind Neural Tangent Kernel | Lil'Log https://lilianweng.github.io/posts/2022-09-08-ntk/ 0 comments
Linked pages
- [2205.01068] OPT: Open Pre-trained Transformer Language Models https://arxiv.org/abs/2205.01068 318 comments
- The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
- [2208.07339] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale https://arxiv.org/abs/2208.07339 33 comments
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- [1803.03635] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks https://arxiv.org/abs/1803.03635 32 comments
- [1503.02531] Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531 5 comments
- https://arxiv.org/abs/2111.12763 5 comments
- [1902.09574] The State of Sparsity in Deep Neural Networks https://arxiv.org/abs/1902.09574 3 comments
- [2106.05974] Scaling Vision with Sparse Mixture of Experts https://arxiv.org/abs/2106.05974 2 comments
- [2209.01667] A Review of Sparse Expert Models in Deep Learning https://arxiv.org/abs/2209.01667 1 comment
- [1911.02150] Fast Transformer Decoding: One Write-Head is All You Need https://arxiv.org/abs/1911.02150 1 comment
- [2001.04451] Reformer: The Efficient Transformer https://arxiv.org/abs/2001.04451 0 comments
- [2009.06732] Efficient Transformers: A Survey https://arxiv.org/abs/2009.06732 0 comments
- [2207.07061] Confident Adaptive Language Modeling https://arxiv.org/abs/2207.07061 0 comments
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- GitHub - mit-han-lab/smoothquant: [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models https://github.com/mit-han-lab/smoothquant 0 comments
- Some Math behind Neural Tangent Kernel | Lil'Log https://lilianweng.github.io/posts/2022-09-08-ntk/ 0 comments
Related searches:
Search whole site: site:lilianweng.github.io
Search title: Large Transformer Model Inference Optimization | Lil'Log
See how to search.