Hacker News
- Refusal in LLMs is mediated by a single direction https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction 20 comments
Lobsters
- Refusal in LLMs is mediated by a single direction https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction 2 comments ai
Linking pages
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:lesswrong.com
Search title: Refusal in LLMs is mediated by a single direction — LessWrong
See how to search.