Linking pages
- The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data https://www.interconnects.ai/p/q-star 64 comments
- Specifying objectives in RLHF - by Nathan Lambert https://www.interconnects.ai/p/specifying-objectives-in-rlhf 0 comments
- RLHF learning resources in 2024 - by Nathan Lambert https://www.interconnects.ai/p/rlhf-resources 0 comments
Linked pages
- Snapchat sees spike in 1-star reviews as users pan the 'My AI' feature, calling for its removal | TechCrunch https://techcrunch.com/2023/04/24/snapchat-sees-spike-in-1-star-reviews-as-users-pan-the-my-ai-feature-calling-for-its-removal/ 606 comments
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | by the Team with members from UC Berkeley, CMU, Stanford, and UC San Diego https://vicuna.lmsys.org/ 168 comments
- https://chat.lmsys.org/ 51 comments
- Illustrating Reinforcement Learning from Human Feedback (RLHF) https://huggingface.co/blog/rlhf 14 comments
- John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges - YouTube https://www.youtube.com/watch?v=hhiLw5Q_UFg 3 comments
- Teaching large language models to zip their lips https://gretel.ai/blog/teaching-large-language-models-to-zip-their-lips 2 comments
- AI Spam Is Already Flooding the Internet and It Has an Obvious Tell https://www.vice.com/en/article/5d9bvn/ai-spam-is-already-flooding-the-internet-and-it-has-an-obvious-tell 1 comment
- rl-for-llms.md · GitHub https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81 0 comments
Related searches:
Search whole site: site:interconnects.ai
Search title: Beyond human data: RLAIF needs a rebrand
See how to search.