The Checklist: What Succeeding at AI Safety Will Involve - Sam Bowman - discu.eu

Linking pages

Anthropic has hired an 'AI welfare' researcher https://www.transformernews.ai/p/anthropic-ai-welfare-researcher 56 comments
AI #80: Never Will It Ever - by Zvi Mowshowitz https://thezvi.substack.com/p/ai-80-never-will-it-ever 0 comments

Linked pages

[2212.09251] Discovering Language Model Behaviors with Model-Written Evaluations https://arxiv.org/abs/2212.09251 50 comments
[2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
[1606.06565] Concrete Problems in AI Safety https://arxiv.org/abs/1606.06565 3 comments
II. From AGI to Superintelligence: the Intelligence Explosion - SITUATIONAL AWARENESS https://situational-awareness.ai/from-agi-to-superintelligence/ 3 comments
[2311.08379] Scheming AIs: Will AIs fake alignment during training in order to get power? https://arxiv.org/abs/2311.08379 0 comments

Related searches:

Search whole site: site:sleepinyourhat.github.io

Search title: The Checklist: What Succeeding at AI Safety Will Involve - Sam Bowman

See how to search.

Submit link to: