- The implicit dynamics of optimizing costs vs. rewards vs. preferences https://robotic.substack.com/p/costs-v-rewards-v-preferences 3 comments reinforcementlearning
Linking pages
Linked pages
- Reward is not enough - by Nathan Lambert https://robotic.substack.com/p/reward-is-not-enough 9 comments
- Bellman equation - Wikipedia https://en.wikipedia.org/wiki/Bellman_equation 6 comments
- [2210.10760] Scaling Laws for Reward Model Overoptimization https://arxiv.org/abs/2210.10760 0 comments
- Reward is Enough https://www.deepmind.com/publications/reward-is-enough 0 comments
Related searches:
Search whole site: site:robotic.substack.com
Search title: The implicit dynamics of optimizing costs vs. rewards vs. preferences
See how to search.