Fast LLM Inference From Scratch - discu.eu

Hacker News

Fast LLM Inference From Scratch (using CUDA) https://andrewkchan.dev/posts/yalm.html 57 comments 14/12/2024

Linking pages

Data Science Weekly - Issue 598 https://datascienceweekly.substack.com/p/data-science-weekly-issue-598 0 comments

Linked pages

GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ https://github.com/ggerganov/llama.cpp 286 comments
GitHub - karpathy/llama2.c: Inference Llama 2 in pure C, single file, fp32, haha https://github.com/karpathy/llama2.c 167 comments
[2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 150 comments
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog https://siboehm.com/articles/22/CUDA-MMM 49 comments
[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
[2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 8 comments
zeux.io - LLM inference speed of light https://zeux.io/2024/03/15/llm-inference-sol/ 3 comments
Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
Cartesia https://www.cartesia.ai/ 0 comments
GitHub - zeux/calm: CUDA/Metal accelerated language model inference https://github.com/zeux/calm 0 comments

Related searches:

Search whole site: site:andrewkchan.dev

Search title: Fast LLM Inference From Scratch

See how to search.

Submit link to: