Hacker News
- AIs Will Increasingly Fake Alignment https://thezvi.substack.com/p/ais-will-increasingly-fake-alignment 102 comments
Linked pages
- Beware Isolated Demands For Rigor | Slate Star Codex http://slatestarcodex.com/2014/08/14/beware-isolated-demands-for-rigor/ 26 comments
- http://blank 5 comments
- Coherent Extrapolated Volition - LessWrong https://www.lesswrong.com/tag/coherent-extrapolated-volition 1 comment
- https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf 1 comment
- The Leopold Model: Analysis and Reactions https://thezvi.substack.com/p/the-leopold-model-analysis-and-reactions 0 comments
- AIs Will Increasingly Attempt Shenanigans https://thezvi.substack.com/p/ais-will-increasingly-attempt-shenanigans 0 comments
- Claude Fights Back - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/claude-fights-back 0 comments
- No, LLMs are not "scheming" - by Rohit Krishnan https://www.strangeloopcanon.com/p/no-llms-are-not-scheming 0 comments
Related searches:
Search whole site: site:thezvi.substack.com
Search title: AIs Will Increasingly Fake Alignment - by Zvi Mowshowitz
See how to search.