Linking pages
- Summarizing Books with Human Feedback https://openai.com/blog/summarizing-books/ 19 comments
- AI-Written Critiques Help Humans Notice Flaws https://openai.com/blog/critiques/ 6 comments
- Reward Modeling for Large language models (with code) https://explodinggradients.com/reward-modeling-for-large-language-models-with-code 1 comment
- GitHub - Jakobovski/ai-safety-cheatsheet: A compilation of AI safety ideas, problems and solutions. https://github.com/Jakobovski/ai-safety-cheatsheet 0 comments
- Alignment of Language Agents. By Zachary Kenton, Tom Everitt, Laura… | by DeepMind Safety Research | Medium https://medium.com/@deepmindsafetyresearch/alignment-of-language-agents-9fbc7dd52c6c 0 comments
- Designing agent incentives to avoid reward tampering | by DeepMind Safety Research | Medium https://medium.com/@deepmindsafetyresearch/designing-agent-incentives-to-avoid-reward-tampering-4380c1bb6cd 0 comments
- Designing agent incentives to avoid side effects | by DeepMind Safety Research | Medium https://medium.com/@deepmindsafetyresearch/designing-agent-incentives-to-avoid-side-effects-e1ac80ea6107 0 comments
- Neurotechnology is Critical for AI Alignment https://milan.cvitkovic.net/writing/neurotechnology_is_critical_for_ai_alignment/ 0 comments
- Specification gaming: the flip side of AI ingenuity - Google DeepMind https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/ 0 comments
Linked pages
- [1712.01815] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm https://arxiv.org/abs/1712.01815 573 comments
- https://deepmind.com/blog/deepmind-and-blizzard-open-starcraft-ii-ai-research-environment/ 444 comments
- Dota 2 https://blog.openai.com/dota-2/ 87 comments
- https://deepmind.com/blog/safety-first-ai-autonomous-data-centre-cooling-and-industrial-control/ 32 comments
- [1606.06565] Concrete Problems in AI Safety https://arxiv.org/abs/1606.06565 3 comments
- Human-level control through deep reinforcement learning | Nature https://www.nature.com/articles/nature14236 3 comments
- http://futureoflife.org/data/documents/research_priorities.pdf 3 comments
- Learning complex goals with iterated amplification https://blog.openai.com/amplifying-ai-training/ 3 comments
- Building safe artificial intelligence: specification, robustness, and assurance | by DeepMind Safety Research | Medium https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1 0 comments
- [1606.03137] Cooperative Inverse Reinforcement Learning https://arxiv.org/abs/1606.03137 0 comments
- [1805.00899] AI safety via debate https://arxiv.org/abs/1805.00899 0 comments
- Learning through human feedback https://deepmind.com/blog/learning-through-human-feedback/ 0 comments
- [1711.09883] AI Safety Gridworlds https://arxiv.org/abs/1711.09883 0 comments
Related searches:
Search whole site: site:medium.com
Search title: Scalable agent alignment via reward modeling | by DeepMind Safety Research | Medium
See how to search.