Linking pages
- The implicit dynamics of optimizing costs vs. rewards vs. preferences https://robotic.substack.com/p/costs-v-rewards-v-preferences 3 comments
- Cognitive Collective: RLHF Is Not a Magic Wand for Alignment | Scale Venture Partners https://www.scalevp.com/blog/cognitive-collective-rlhf-is-not-a-magic-wand-for-alignment 1 comment
- Three seasons of RL: Metaphor, tool, and framework https://robotic.substack.com/p/rl-tool-or-framework-or-agi 0 comments
Related searches:
Search whole site: site:www.deepmind.com
Search title: Reward is Enough
See how to search.