Hacker News
Linked pages
- GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ https://github.com/ggerganov/llama.cpp 286 comments
- GitHub - karpathy/llama2.c: Inference Llama 2 in pure C, single file, fp32, haha https://github.com/karpathy/llama2.c 167 comments
- [2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 151 comments
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
- [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
- [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 8 comments
- zeux.io - LLM inference speed of light https://zeux.io/2024/03/15/llm-inference-sol/ 3 comments
- Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
- Cartesia https://www.cartesia.ai/ 0 comments
Related searches:
Search whole site: site:andrewkchan.dev
Search title: Fast LLM Inference From Scratch
See how to search.