Hacker News
- Why the original transformer figure is wrong, and some other tidbits about LLMs https://magazine.sebastianraschka.com/p/why-the-original-transformer-figure 49 comments
- [P] Why the Original Transformer Figure Is Wrong, And Some Other Interesting Tidbits https://magazine.sebastianraschka.com/p/why-the-original-transformer-figure 11 comments machinelearning
Linked pages
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
- [2102.11174] Linear Transformers Are Secretly Fast Weight Programmers https://arxiv.org/abs/2102.11174 2 comments
- [1801.06146] Universal Language Model Fine-tuning for Text Classification https://arxiv.org/abs/1801.06146 0 comments
- Neural nets learn to program neural nets with with fast weights (1991) https://people.idsia.ch/~juergen/fast-weight-programmer-1991-transformer.html 0 comments
- [2112.11446] Scaling Language Models: Methods, Analysis & Insights from Training Gopher https://arxiv.org/abs/2112.11446 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:magazine.sebastianraschka.com
Search title: Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs
See how to search.