Hacker News
Linked pages
- Things we learned out about LLMs in 2024 https://simonwillison.net/2024/Dec/31/llms-in-2024/ 691 comments
- Alignment faking in large language models \ Anthropic https://www.anthropic.com/research/alignment-faking 353 comments
- http://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission 321 comments
- Introduction - SITUATIONAL AWARENESS: The Decade Ahead https://situational-awareness.ai/ 77 comments
- Clio: Privacy-preserving insights into real-world AI use \ Anthropic https://www.anthropic.com/research/clio 40 comments
- [2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
- [2406.14546] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data https://arxiv.org/abs/2406.14546 13 comments
- https://openai.com/index/introducing-swe-bench-verified/ 10 comments
- By default, capital will matter more than ever after AGI — LessWrong https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi 8 comments
- [2409.12822] Language Models Learn to Mislead Humans via RLHF https://arxiv.org/abs/2409.12822 7 comments
- https://archive.ph/uBMVw 7 comments
- [2407.13692] Prover-Verifier Games improve legibility of LLM outputs https://arxiv.org/abs/2407.13692 3 comments
- Scheming reasoning evaluations — Apollo Research https://www.apolloresearch.ai/research/scheming-reasoning-evaluations 3 comments
- Chicago Pile-1 - Wikipedia http://en.wikipedia.org/wiki/Chicago_Pile-1 2 comments
- [2403.03218] The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning https://arxiv.org/abs/2403.03218 1 comment
- [2406.04313] Improving Alignment and Robustness with Circuit Breakers https://arxiv.org/abs/2406.04313 1 comment
- The Leopold Model: Analysis and Reactions https://thezvi.substack.com/p/the-leopold-model-analysis-and-reactions 0 comments
- [2407.04622] On scalable oversight with weak LLMs judging strong LLMs https://arxiv.org/abs/2407.04622#deepmind 0 comments
- [2410.13722] Persistent Pre-Training Poisoning of LLMs https://arxiv.org/abs/2410.13722 0 comments
- The Mask Comes Off: At What Price? - by Zvi Mowshowitz https://thezvi.substack.com/p/the-mask-comes-off-at-what-price 0 comments
Related searches:
Search whole site: site:thezvi.substack.com
Search title: AI #97: 4 - by Zvi Mowshowitz - Don't Worry About the Vase
See how to search.