Hacker News
- The Transformer Family https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
Linking pages
- Prompt Engineering | Lil'Log https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/ 59 comments
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments
- [AINews] The world's first fully autonomous AI Engineer • Buttondown https://buttondown.email/ainews/archive/ainews-the-worlds-first-fully-autonomous-ai/ 0 comments
- Diffusion Models for Video Generation | Lil'Log https://lilianweng.github.io/posts/2024-04-12-diffusion-video/ 0 comments
- GitHub - fabiochiusano/ai-news-tracker: ~300 news for quickly getting up-to-date with the generative AI landscape https://github.com/fabiochiusano/ai-news-tracker 0 comments
- Ab Analytica – Dr. Mindle's Musings https://www.drmindle.com/ab-analytica/ 0 comments
- GitHub - fabiochiusano/Awesome-AI-News: ~300 news for quickly getting up-to-date with the generative AI landscape https://github.com/fabiochiusano/Awesome-AI-News/tree/main 0 comments
- (Opinionated) Guide to ML Engineer Job Hunting | Yuan Meng https://www.yuan-meng.com/posts/mle_interviews/ 0 comments
Linked pages
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- GitHub - facebookresearch/faiss: A library for efficient similarity search and clustering of dense vectors. https://github.com/facebookresearch/faiss 100 comments
- Dirac delta function - Wikipedia http://en.wikipedia.org/wiki/Dirac_delta_function 66 comments
- Locality-sensitive hashing - Wikipedia https://en.wikipedia.org/wiki/Locality-sensitive_hashing 40 comments
- Directed graph - Wikipedia http://en.wikipedia.org/wiki/Directed_graph 34 comments
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- [2203.08913] Memorizing Transformers https://arxiv.org/abs/2203.08913 32 comments
- Attention is All you Need https://papers.nips.cc/paper/7181-attention-is-all-you-need 30 comments
- https://github.com/google-research/google-research/tree/master/scann 25 comments
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- Rotation matrix - Wikipedia https://en.wikipedia.org/wiki/Rotation_matrix#Rotation_matrix_from_axis_and_angle 17 comments
- [2108.12409] Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation https://arxiv.org/abs/2108.12409 17 comments
- Contrastive Representation Learning | Lil'Log https://lilianweng.github.io/posts/2021-05-31-contrastive/ 10 comments
- [2106.01345] Decision Transformer: Reinforcement Learning via Sequence Modeling https://arxiv.org/abs/2106.01345 9 comments
- [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 8 comments
- A (Long) Peek into Reinforcement Learning | Lil'Log https://lilianweng.github.io/posts/2018-02-19-rl-overview/ 8 comments
- Attention? Attention! | Lil'Log https://lilianweng.github.io/posts/2018-06-24-attention/ 2 comments
- [1904.10509] Generating Long Sequences with Sparse Transformers https://arxiv.org/abs/1904.10509 1 comment
- [2007.14062] Big Bird: Transformers for Longer Sequences https://arxiv.org/abs/2007.14062 0 comments
- [2001.04451] Reformer: The Efficient Transformer https://arxiv.org/abs/2001.04451 0 comments
Related searches:
Search whole site: site:lilianweng.github.io
Search title: The Transformer Family Version 2.0 | Lil'Log
See how to search.