FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI - discu.eu

Linking pages

The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI https://www.latent.space/p/transformers-math#details 66 comments
RWKV: Reinventing RNNs for the Transformer Era — with Eugene Cheah of UIlicious https://www.latent.space/p/rwkv#%C2%A7the-eleuther-mafia 66 comments
The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis https://www.latent.space/p/semianalysis 40 comments
LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML https://www.latent.space/p/llms-everywhere#details 1 comment
Cursor.so: The AI-first Code Editor — with Aman Sanger of Anysphere https://www.latent.space/p/cursor 1 comment
The Busy Person's Intro to Finetuning & Open Source AI - Wing Lian, Axolotl https://www.latent.space/p/axolotl 0 comments
Cloud Intelligence at the speed of 5000 tok/s - with Ce Zhang and Vipul Ved Prakash of Together AI https://www.latent.space/p/together 0 comments

Linked pages

Related searches:

Search whole site: site:latent.space

Search title: FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

See how to search.

Submit link to: