Linking pages
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. https://jalammar.github.io/illustrated-transformer/ 36 comments
- Big O notation - Wikipedia http://en.wikipedia.org/wiki/Big_O_notation 29 comments
- A New Chip Cluster Will Make Massive AI Models Possible | WIRED https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/ 18 comments
- A Mathematical Framework for Transformer Circuits https://transformer-circuits.pub/2021/framework/index.html 9 comments
- Building a ML Transformer in a Spreadsheet - YouTube https://www.youtube.com/watch?v=S9eKuRVigjY 2 comments
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 1 comment
- [1904.10509] Generating Long Sequences with Sparse Transformers https://arxiv.org/abs/1904.10509 1 comment
- https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf 0 comments
Related searches:
Search whole site: site:aizi.substack.com
Search title: How does GPT-3 spend its 175B parameters? - by Robert Huben
See how to search.