AI #97: 4 - by Zvi Mowshowitz - Don't Worry About the Vase - discu.eu

Hacker News

AI #97: 4 https://thezvi.substack.com/p/ai-97-4 0 comments 3/1/2025

Linked pages

Things we learned out about LLMs in 2024 https://simonwillison.net/2024/Dec/31/llms-in-2024/ 691 comments
Alignment faking in large language models \ Anthropic https://www.anthropic.com/research/alignment-faking 353 comments
http://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission 321 comments
Introduction - SITUATIONAL AWARENESS: The Decade Ahead https://situational-awareness.ai/ 77 comments
Clio: Privacy-preserving insights into real-world AI use \ Anthropic https://www.anthropic.com/research/clio 40 comments
[2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
[2406.14546] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data https://arxiv.org/abs/2406.14546 13 comments
https://openai.com/index/introducing-swe-bench-verified/ 10 comments
By default, capital will matter more than ever after AGI — LessWrong https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi 8 comments
[2409.12822] Language Models Learn to Mislead Humans via RLHF https://arxiv.org/abs/2409.12822 7 comments
https://archive.ph/uBMVw 7 comments
[2407.13692] Prover-Verifier Games improve legibility of LLM outputs https://arxiv.org/abs/2407.13692 3 comments
Scheming reasoning evaluations — Apollo Research https://www.apolloresearch.ai/research/scheming-reasoning-evaluations 3 comments
Chicago Pile-1 - Wikipedia http://en.wikipedia.org/wiki/Chicago_Pile-1 2 comments
[2403.03218] The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning https://arxiv.org/abs/2403.03218 1 comment
[2406.04313] Improving Alignment and Robustness with Circuit Breakers https://arxiv.org/abs/2406.04313 1 comment
The Leopold Model: Analysis and Reactions https://thezvi.substack.com/p/the-leopold-model-analysis-and-reactions 0 comments
[2407.04622] On scalable oversight with weak LLMs judging strong LLMs https://arxiv.org/abs/2407.04622#deepmind 0 comments
[2410.13722] Persistent Pre-Training Poisoning of LLMs https://arxiv.org/abs/2410.13722 0 comments
The Mask Comes Off: At What Price? - by Zvi Mowshowitz https://thezvi.substack.com/p/the-mask-comes-off-at-what-price 0 comments

Related searches:

Search whole site: site:thezvi.substack.com

Search title: AI #97: 4 - by Zvi Mowshowitz - Don't Worry About the Vase

See how to search.

Submit link to: