[1906.01820] Risks from Learned Optimization in Advanced Machine Learning Systems - discu.eu

Linking pages

Will Humans Treat AI Better Than We Treat Animals? - The Atlantic https://www.theatlantic.com/ideas/archive/2023/05/humans-ai-jacy-reese-anthis-sociologist-perspective/673972/ 617 comments
Deceptively Aligned Mesa-Optimizers: It's Not Funny If I Have To Explain It https://astralcodexten.substack.com/p/deceptively-aligned-mesa-optimizers 1 comment
GitHub - Jakobovski/ai-safety-cheatsheet: A compilation of AI safety ideas, problems and solutions. https://github.com/Jakobovski/ai-safety-cheatsheet 0 comments
Nintil - Set Sail For Fail? On AI risk https://nintil.com/ai-safety 0 comments
Truth https://compphil.github.io/truth/ 0 comments
GitHub - elicit/machine-learning-list https://github.com/elicit/machine-learning-list 0 comments
Simple probes can catch sleeper agents \ Anthropic https://www.anthropic.com/research/probes-catch-sleeper-agents 0 comments
AI #61: Meta Trouble - by Zvi Mowshowitz https://thezvi.substack.com/p/ai-61-meta-trouble 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [1906.01820] Risks from Learned Optimization in Advanced Machine Learning Systems

See how to search.

Submit link to: