Hacker News
- Learning to summarize from human feedback (2022) https://arxiv.org/abs/2009.01325 12 comments
- Learning to summarize with human feedback (2020) https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html 2 comments
- A simple explanation of Reinforcement Learning from Human Feedback (RLHF) https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 4 comments learnmachinelearning
- [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 15 comments machinelearning
- Reinforcement Learning with Human Feedback — Free Workshop https://lu.ma/RLHF 7 comments learnmachinelearning
- He Helped Train ChatGPT. It Traumatized Him. A look at the mental toll that Reinforcement Learning from Human Feedback takes on the trainers. https://www.bigtechnology.com/p/he-helped-train-chatgpt-it-traumatized 75 comments artificial
- Introduction to Reinforcement Learning with Human Feedback [D] https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 6 comments machinelearning
- [R] Illustrating Reinforcement Learning from Human Feedback (RLHF) https://huggingface.co/blog/rlhf 12 comments machinelearning
- "Reinforcement Learning from Human Feedback: Progress and Challenges", John Schulman 2023-04-19 {OA} (fighting confabulations) https://www.youtube.com/watch?t=1098s&v=hhiLw5Q_UFg 3 comments reinforcementlearning
- GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM https://github.com/lucidrains/PaLM-rlhf-pytorch 2 comments python
- I made a library to parse human readable numbers list (e.g. `1:10,15:20`) into an iterator, this is my first time making a rust library and I wanted to learn better about generics and packaging. Please provide me with feedback. https://github.com/Atreyagaurav/number_range 10 comments rust
- "Echo Chess: The Quest for Solvability" (level design preference learning: predicting high-quality soluble mazes using human feedback from quitting rates) https://samiramly.com/chess 9 comments reinforcementlearning
- ChatLLaMA 🦙 the first open source implementation of LLaMA based on Reinforcement Learning from Human Feedback (RLHF): https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama 5 comments deeplearning
- [R] Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback https://arxiv.org/abs/2307.16039 3 comments machinelearning
- Artificially intelligent robot signs with Dentons, the world's largest law firm. The machine understand legal concepts, and learns from questions and feedback, "Just like a human, it’s getting its experience in a law firm and being able to learn and get better" http://www.theglobeandmail.com/report-on-business/industry-news/the-law-page/u-of-t-students-artificially-intelligent-robot-signs-with-dentons-law-firm/article25898779/ 7 comments worldnews