Linking pages
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- [1803.03635] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks https://arxiv.org/abs/1803.03635 32 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- How Transformers Work. Transformers are a type of neural… | by Giuliano Giacaglia | Towards Data Science https://towardsdatascience.com/transformers-141e32e69591 6 comments
- [2303.18223] A Survey of Large Language Models https://arxiv.org/abs/2303.18223 3 comments
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 1 comment
- [2203.15556] Training Compute-Optimal Large Language Models https://arxiv.org/abs/2203.15556 0 comments
- [2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 0 comments
- https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf 0 comments
Related searches:
Search whole site: site:fabricatedknowledge.com
Search title: AI Foundations Part 1: Transformers, Pre-Training and Fine-Tuning, and Scaling
See how to search.