Hacker News
- Transformers from Scratch (2019) http://peterbloem.nl/blog/transformers 9 comments
- Transformers from Scratch http://www.peterbloem.nl/blog/transformers 28 comments
- Slight confused on the matrix multiplication setup for transformer's attention http://peterbloem.nl/blog/transformers 3 comments learnmachinelearning
Linking pages
- ML Resources https://sgfin.github.io/learning-resources/ 21 comments
- How To Make Custom AI-Generated Text With GPT-2 | Max Woolf's Blog https://minimaxir.com/2019/09/howto-gpt2/ 21 comments
- GPT-2 Neural Network Poetry · Gwern.net https://www.gwern.net/GPT-2 13 comments
- GitHub - lucidrains/vit-pytorch: Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch https://github.com/lucidrains/vit-pytorch#vision-transformer-for-small-datasets 3 comments
- Let Us Show You How GPT Works — Using Jane Austen - The New York Times https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html 1 comment
- Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention | by Jesse Vig | Towards Data Science https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1 0 comments
- Long term credit assignment with temporal reward transport · EFAVDB https://www.efavdb.com/ltca 0 comments
- NLP Year in Review — 2019. NLP highlights for the year 2019. | by elvis | DAIR.AI | Medium https://medium.com/dair-ai/nlp-year-in-review-2019-fb8d523bcb19 0 comments
- GitHub - liuliu/s4nnc: Swift for NNC https://github.com/liuliu/s4nnc/ 0 comments
- 2019: The Year of BERT. As we wrap up 2019, it’s interesting to… | by Natasha Latysheva | Towards Data Science https://medium.com/@natasha.latysheva/2019-the-year-of-bert-354e8106f7ba 0 comments
- Schedule | EECS 498-007 / 598-005: Deep Learning for Computer Vision https://web.eecs.umich.edu/~justincj/teaching/eecs498/WI2022/schedule.html 0 comments
Linked pages
- The Unreasonable Effectiveness of Recurrent Neural Networks https://karpathy.github.io/2015/05/21/rnn-effectiveness/ 434 comments
- 500'000€ Prize for Compressing Human Knowledge http://prize.hutter1.net/ 253 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- Better Language Models and Their Implications https://openai.com/blog/better-language-models/ 99 comments
- Understanding LSTM Networks -- colah's blog https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 64 comments
- Deep Learning with PyTorch: A 60 Minute Blitz — PyTorch Tutorials 1.12.1+cu102 documentation https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html 59 comments
- Tim Rocktäschel https://rockt.github.io/2018/04/30/einsum 48 comments
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. https://jalammar.github.io/illustrated-transformer/ 36 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- Researchers create 'malicious' writing AI - BBC News https://www.bbc.com/news/technology-47249163 18 comments
- Generative Modeling with Sparse Transformers https://openai.com/blog/sparse-transformer/ 9 comments
- The Annotated Transformer https://nlp.seas.harvard.edu/2018/04/03/attention.html 3 comments
- A ten-minute introduction to sequence-to-sequence learning in Keras https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html 0 comments
- 💥 Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups | by Thomas Wolf | HuggingFace | Medium https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255 0 comments
Related searches:
Search whole site: site:peterbloem.nl
Search title: Transformers from scratch | peterbloem.nl
See how to search.