Linking pages
Linked pages
- [2212.09251] Discovering Language Model Behaviors with Model-Written Evaluations https://arxiv.org/abs/2212.09251 50 comments
- [2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
- [1606.06565] Concrete Problems in AI Safety https://arxiv.org/abs/1606.06565 3 comments
- II. From AGI to Superintelligence: the Intelligence Explosion - SITUATIONAL AWARENESS https://situational-awareness.ai/from-agi-to-superintelligence/ 3 comments
- [2311.08379] Scheming AIs: Will AIs fake alignment during training in order to get power? https://arxiv.org/abs/2311.08379 0 comments
Related searches:
Search whole site: site:sleepinyourhat.github.io
Search title: The Checklist: What Succeeding at AI Safety Will Involve - Sam Bowman
See how to search.