- Value head in GPT2 https://arxiv.org/abs/1909.08593 4 comments reinforcementlearning
Linking pages
- Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
- Fine-Tuning GPT-2 from Human Preferences https://openai.com/blog/fine-tuning-gpt-2/ 19 comments
- LLM Training: RLHF and Its Alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
- Learning to Summarize with Human Feedback https://openai.com/blog/learning-to-summarize-with-human-feedback/ 0 comments
- GitHub - will-thompson-k/tldr-transformers: The "tl;dr" on a few notable transformer papers (pre-2022). https://github.com/will-thompson-k/tldr-transformers 0 comments
- GitHub - tomohideshibata/BERT-related-papers: BERT-related papers https://github.com/tomohideshibata/BERT-related-papers 0 comments
- GitHub - lvwerra/trl: Train transformer language models with reinforcement learning. https://github.com/lvwerra/trl 0 comments
- Ahead of AI #6: TrAIn Differently - by Sebastian Raschka https://magazine.sebastianraschka.com/p/ahead-of-ai-6-train-differently 0 comments
- Ahead of AI #6: TrAIn Differently - by Sebastian Raschka https://magazine.sebastianraschka.com/p/ahead-of-ai-6-train-differently?sd=pf 0 comments
- GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated) https://github.com/opendilab/awesome-RLHF 0 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
- Controllable Neural Text Generation | Lil'Log https://lilianweng.github.io/posts/2021-01-02-controllable-text-generation/ 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [1909.08593] Fine-Tuning Language Models from Human Preferences
See how to search.