RLHF: Reinforcement Learning from Human Feedback

Linking pages

Linked pages

[2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
Data API Terms - Reddit https://www.redditinc.com/policies/data-api-terms 151 comments
[2110.10819] Shaking the foundations: delusions in sequence models for interaction and control https://arxiv.org/abs/2110.10819 6 comments
John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges - YouTube https://www.youtube.com/watch?v=hhiLw5Q_UFg 3 comments
GitHub - tatsu-lab/stanford_alpaca https://github.com/tatsu-lab/stanford_alpaca 2 comments
[2211.04325] Will we run out of data? Limits of LLM scaling based on human-generated data https://arxiv.org/abs/2211.04325 1 comment
[2204.05862] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback https://arxiv.org/abs/2204.05862 1 comment
Aligning language models to follow instructions https://openai.com/research/instruction-following 1 comment
https://arxiv.org/abs/2203.02155 0 comments
GitHub - togethercomputer/RedPajama-Data: The RedPajama-Data repository contains code for preparing large datasets for training large language models. https://github.com/togethercomputer/RedPajama-Data 0 comments
rl-for-llms.md · GitHub https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81 0 comments
[2112.11446] Scaling Language Models: Methods, Analysis & Insights from Training Gopher https://arxiv.org/abs/2112.11446 0 comments
[2302.13971] LLaMA: Open and Efficient Foundation Language Models https://arxiv.org/abs/2302.13971 0 comments