Hacker News
Linking pages
- Representation Engineering Mistral-7B an Acid Trip https://vgel.me/posts/representation-engineering/ 75 comments
- Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
- The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/p/dec-2023 0 comments
- The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/i/140396949/mixtral-sparks-a-gpuinference-war 0 comments
Linked pages
- GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ https://github.com/ggerganov/llama.cpp 286 comments
- [2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
- The Best GPUs for Deep Learning in 2023 — An In-depth Analysis https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/ 145 comments
- I made a transformer by hand (no training!) https://vgel.me/posts/handmade-transformer/ 95 comments
- GitHub - 1rgs/jsonformer https://github.com/1rgs/jsonformer 83 comments
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
- Compiling ML models to C for fun | Max Bernstein https://bernsteinbear.com/blog/compiling-ml-models/ 47 comments
- [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
- [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention https://arxiv.org/abs/2309.06180 16 comments
- LLM.int8() and Emergent Features — Tim Dettmers https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/ 15 comments
- GitHub - artidoro/qlora: QLoRA: Efficient Finetuning of Quantized LLMs https://github.com/artidoro/qlora 5 comments
- https://twitter.com/voooooogel/status/1730726744314069190 4 comments
- [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135 3 comments
- [2302.10866] Hyena Hierarchy: Towards Larger Convolutional Language Models https://arxiv.org/abs/2302.10866 3 comments
- k-quants by ikawrakow · Pull Request #1684 · ggerganov/llama.cpp · GitHub https://github.com/ggerganov/llama.cpp/pull/1684 3 comments
- Break the Sequential Dependency of LLM Inference Using Lookahead Decoding | LMSYS Org https://lmsys.org/blog/2023-11-21-lookahead-decoding/ 2 comments
- [1911.02150] Fast Transformer Decoding: One Write-Head is All You Need https://arxiv.org/abs/1911.02150 1 comment
- GitHub - karpathy/micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API https://github.com/karpathy/micrograd 0 comments
- https://proceedings.neurips.cc/paper_files/paper/2022/file/47e288629a6996a17ce50b90a056a0e1-Paper-Conference.pdf 0 comments
- GitHub - TimDettmers/bitsandbytes: 8-bit CUDA functions for PyTorch https://github.com/TimDettmers/bitsandbytes 0 comments
Related searches:
Search whole site: site:vgel.me
Search title: How to make LLMs go fast
See how to search.