[1706.03741] Deep reinforcement learning from human preferences

Linking pages

GitHub - brexhq/prompt-engineering: Tips and tricks for working with Large Language Models like OpenAI's GPT-4. https://github.com/brexhq/prompt-engineering 105 comments
In Continued Defense Of Effective Altruism https://www.astralcodexten.com/p/in-continued-defense-of-effective 49 comments
Lessons Learned Reproducing a Deep Reinforcement Learning Paper http://amid.fish/reproducing-deep-rl 37 comments
Learning from Human Preferences https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/ 7 comments
ChatGPT Decoded: An expert guide to mastering the technology and building domain-specific intelligent bots with GPT and reinforcement learning on AWS SageMaker | by Arun Shankar | Feb, 2023 | Medium https://medium.com/@shankar.arunp/chatgpt-decoded-an-expert-guide-to-mastering-the-technology-and-building-domain-specific-3a95b42827bb?sk=e025c40b1a15863f94c1a6105d089222&source=friends_link 7 comments
GitHub - Xpitfire/symbolicai: Compositional Differentiable Programming Library https://github.com/Xpitfire/symbolicai 5 comments
Reinforcement Learning without Reward Engineering | by Nikita Pavlichenko | Toloka Tech | Medium https://medium.com/p/60c63402c59f 1 comment
DeepMind now learns from human preferences – just like a toddler | New Scientist https://www.newscientist.com/article/2134740-deepmind-now-learns-from-human-preferences-just-like-a-toddler/ 0 comments
Learning What To Do by Simulating the Past – The Berkeley Artificial Intelligence Research Blog https://bair.berkeley.edu/blog/2021/05/03/rlsp/ 0 comments
BASALT: A Benchmark for Learning from Human Feedback – The Berkeley Artificial Intelligence Research Blog https://bair.berkeley.edu/blog/2021/07/08/basalt/ 0 comments
Two Giants of AI Team Up to Head Off the Robot Apocalypse | WIRED https://www.wired.com/story/two-giants-of-ai-team-up-to-head-off-the-robot-apocalypse/ 0 comments
GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated) https://github.com/opendilab/awesome-RLHF 0 comments
Uncertain Simulators Don't Always Simulate Uncertain Agents | Daniel D. Johnson https://www.danieldjohnson.com/2023/03/27/uncertain_simulators/ 0 comments
GitHub - Mooler0410/LLMsPracticalGuide: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) https://github.com/Mooler0410/LLMsPracticalGuide 0 comments
GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
Is AI alignment on track? Is it progressing... too fast? - Alexey Guzey https://guzey.com/ai/alignment-on-track/ 0 comments
Controllable Neural Text Generation | Lil'Log https://lilianweng.github.io/posts/2021-01-02-controllable-text-generation/ 0 comments
RLHF 201 - with Nathan Lambert of AI2 and Interconnects https://www.latent.space/p/rlhf-201 0 comments