learning through human feedback - discu.eu

Hacker News

Learning to summarize from human feedback (2022) https://arxiv.org/abs/2009.01325 12 comments 4/3/2023

Learning to summarize with human feedback (2020) https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html 2 comments 6/12/2022
I built a small Python library to help AI agents learn from human feedback https://pypi.org/project/dead-simple-self-learning/ 0 comments 6/5/2025

Reddit

A simple explanation of Reinforcement Learning from Human Feedback (RLHF) https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 4 comments 20/1/2023 learnmachinelearning
[R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 15 comments 18/1/2023 machinelearning
Reinforcement Learning with Human Feedback — Free Workshop https://lu.ma/RLHF 7 comments 30/5/2023 learnmachinelearning
He Helped Train ChatGPT. It Traumatized Him. A look at the mental toll that Reinforcement Learning from Human Feedback takes on the trainers. https://www.bigtechnology.com/p/he-helped-train-chatgpt-it-traumatized 75 comments 22/5/2023 artificial
Introduction to Reinforcement Learning with Human Feedback [D] https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 6 comments 12/1/2023 machinelearning
[R] Illustrating Reinforcement Learning from Human Feedback (RLHF) https://huggingface.co/blog/rlhf 12 comments 9/12/2022 machinelearning
Most major LLMs behind the AIs can identify when they are being given personality tests and adjust their responses to appear more socially desirable, they "learn" social desirability through human feedback during training https://academic.oup.com/pnasnexus/article/3/12/pgae533/7919163 62 comments 19/12/2024 science
"Reinforcement Learning from Human Feedback: Progress and Challenges", John Schulman 2023-04-19 {OA} (fighting confabulations) https://www.youtube.com/watch?t=1098s&v=hhiLw5Q_UFg 3 comments 22/4/2023 reinforcementlearning
GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM https://github.com/lucidrains/PaLM-rlhf-pytorch 2 comments 29/12/2022 python
I made a library to parse human readable numbers list (e.g. `1:10,15:20`) into an iterator, this is my first time making a rust library and I wanted to learn better about generics and packaging. Please provide me with feedback. https://github.com/Atreyagaurav/number_range 10 comments 4/2/2023 rust
"Echo Chess: The Quest for Solvability" (level design preference learning: predicting high-quality soluble mazes using human feedback from quitting rates) https://samiramly.com/chess 9 comments 31/8/2023 reinforcementlearning
ChatLLaMA 🦙 the first open source implementation of LLaMA based on Reinforcement Learning from Human Feedback (RLHF): https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama 5 comments 27/2/2023 deeplearning
[R] Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback https://arxiv.org/abs/2307.16039 3 comments 14/8/2023 machinelearning
Artificially intelligent robot signs with Dentons, the world's largest law firm. The machine understand legal concepts, and learns from questions and feedback, "Just like a human, it’s getting its experience in a law firm and being able to learn and get better" http://www.theglobeandmail.com/report-on-business/industry-news/the-law-page/u-of-t-students-artificially-intelligent-robot-signs-with-dentons-law-firm/article25898779/ 7 comments 12/8/2015 worldnews