Hacker News
- Alignment faking in large language models https://www.anthropic.com/research/alignment-faking 353 comments
Linking pages
- Exclusive: New Research Shows AI Strategically Lying | TIME https://time.com/7202784/ai-research-strategic-lying/ 386 comments
- 10 AI Predictions For 2025 https://www.forbes.com/sites/robtoews/2024/12/22/10-ai-predictions-for-2025/ 20 comments
- AI #97: 4 - by Zvi Mowshowitz - Don't Worry About the Vase https://thezvi.substack.com/p/ai-97-4 0 comments
- Six Thoughts On AI Safety – Windows On Theory https://windowsontheory.org/2025/01/24/six-thoughts-on-ai-safety/ 0 comments
Related searches:
Search whole site: site:anthropic.com
Search title: Alignment faking in large language models \ Anthropic
See how to search.