Linking pages
- Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken 3 comments
- Simple probes can catch sleeper agents \ Anthropic https://www.anthropic.com/research/probes-catch-sleeper-agents 0 comments
- A new initiative for developing third-party model evaluations \ Anthropic https://www.anthropic.com/news/a-new-initiative-for-developing-third-party-model-evaluations 0 comments
Linked pages
Related searches:
Search whole site: site:www.anthropic.com
Search title: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training \ Anthropic
See how to search.