Hacker News
- Learning to summarize from human feedback (2022) https://arxiv.org/abs/2009.01325 12 comments
- Learning to summarize with human feedback (2020) https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html 2 comments
- A simple explanation of Reinforcement Learning from Human Feedback (RLHF) https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 4 comments learnmachinelearning
- [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 15 comments machinelearning
- Reinforcement Learning with Human Feedback — Free Workshop https://lu.ma/RLHF 7 comments learnmachinelearning
- He Helped Train ChatGPT. It Traumatized Him. A look at the mental toll that Reinforcement Learning from Human Feedback takes on the trainers. https://www.bigtechnology.com/p/he-helped-train-chatgpt-it-traumatized 75 comments artificial
- Introduction to Reinforcement Learning with Human Feedback [D] https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 6 comments machinelearning
- [R] Illustrating Reinforcement Learning from Human Feedback (RLHF) https://huggingface.co/blog/rlhf 12 comments machinelearning
- Most major LLMs behind the AIs can identify when they are being given personality tests and adjust their responses to appear more socially desirable, they "learn" social desirability through human feedback during training https://academic.oup.com/pnasnexus/article/3/12/pgae533/7919163 62 comments science
- "Reinforcement Learning from Human Feedback: Progress and Challenges", John Schulman 2023-04-19 {OA} (fighting confabulations) https://www.youtube.com/watch?t=1098s&v=hhiLw5Q_UFg 3 comments reinforcementlearning
- GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM https://github.com/lucidrains/PaLM-rlhf-pytorch 2 comments python
- I made a library to parse human readable numbers list (e.g. `1:10,15:20`) into an iterator, this is my first time making a rust library and I wanted to learn better about generics and packaging. Please provide me with feedback. https://github.com/Atreyagaurav/number_range 10 comments rust
- "Echo Chess: The Quest for Solvability" (level design preference learning: predicting high-quality soluble mazes using human feedback from quitting rates) https://samiramly.com/chess 9 comments reinforcementlearning
- ChatLLaMA 🦙 the first open source implementation of LLaMA based on Reinforcement Learning from Human Feedback (RLHF): https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama 5 comments deeplearning
- [R] Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback https://arxiv.org/abs/2307.16039 3 comments machinelearning
- Artificially intelligent robot signs with Dentons, the world's largest law firm. The machine understand legal concepts, and learns from questions and feedback, "Just like a human, it’s getting its experience in a law firm and being able to learn and get better" http://www.theglobeandmail.com/report-on-business/industry-news/the-law-page/u-of-t-students-artificially-intelligent-robot-signs-with-dentons-law-firm/article25898779/ 7 comments worldnews