Hacker News
- Outperforming larger language models with less training data and smaller models https://blog.research.google/2023/09/distilling-step-by-step-outperforming.html 123 comments
Linking pages
Linked pages
- Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance – Google AI Blog https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html 279 comments
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer – Google AI Blog https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html 66 comments
- [2305.02301] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes https://arxiv.org/abs/2305.02301 56 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- [1503.02531] Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531 5 comments
- [2201.11903] Chain of Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/abs/2201.11903 1 comment
- [1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer https://arxiv.org/abs/1910.10683 1 comment
- [1801.06146] Universal Language Model Fine-tuning for Text Classification https://arxiv.org/abs/1801.06146 0 comments
- Vertex AI | Google Cloud https://cloud.google.com/vertex-ai 0 comments
- [1910.14599] Adversarial NLI: A New Benchmark for Natural Language Understanding https://arxiv.org/abs/1910.14599 0 comments