How RLHF actually works - by Nathan Lambert - Interconnects - discu.eu

Hacker News

How RLHF Works https://www.interconnects.ai/p/how-rlhf-works 32 comments 21/6/2023

Linking pages

Specifying objectives in RLHF - by Nathan Lambert https://www.interconnects.ai/p/specifying-objectives-in-rlhf 0 comments
RLHF learning resources in 2024 - by Nathan Lambert https://www.interconnects.ai/p/rlhf-resources 0 comments
Why reward models are key for alignment - by Nathan Lambert https://www.interconnects.ai/p/why-reward-models-matter 0 comments

Linked pages

Inside the AI Factory: the humans that make tech seem human - The Verge https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots 17 comments
[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model https://arxiv.org/abs/2305.18290 8 comments
[2112.09332] WebGPT: Browser-assisted question-answering with human feedback https://arxiv.org/abs/2112.09332 3 comments
John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges - YouTube https://www.youtube.com/watch?v=hhiLw5Q_UFg 3 comments
[2303.08774] GPT-4 Technical Report https://arxiv.org/abs/2303.08774 1 comment
[2204.05862] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback https://arxiv.org/abs/2204.05862 1 comment
https://arxiv.org/abs/2203.02155 0 comments
[2112.00861] A General Language Assistant as a Laboratory for Alignment https://arxiv.org/abs/2112.00861 0 comments
[2302.07459] The Capacity for Moral Self-Correction in Large Language Models https://arxiv.org/abs/2302.07459 0 comments
Specifying hallucinations - by Nathan Lambert https://www.interconnects.ai/p/specifying-hallucinations-llms 0 comments

Related searches:

Search whole site: site:interconnects.ai

Search title: How RLHF actually works - by Nathan Lambert - Interconnects

See how to search.

Submit link to: