Hacker News
- Refusal in LLMs is mediated by a single direction https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction 20 comments
Lobsters
- Refusal in LLMs is mediated by a single direction https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction 2 comments ai
Linking pages
- On Emergent Misalignment - by Zvi Mowshowitz https://thezvi.substack.com/p/on-emergent-misalignment 1 comment
- Q&A on Proposed SB 1047 - by Zvi Mowshowitz https://thezvi.substack.com/p/q-and-a-on-proposed-sb-1047 0 comments
- AI #62: Too Soon to Tell - by Zvi Mowshowitz https://thezvi.substack.com/p/ai-62-too-soon-to-tell 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:lesswrong.com
Search title: Refusal in LLMs is mediated by a single direction — LessWrong
See how to search.