[2310.03693] Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! - discu.eu

Linking pages

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside | WIRED https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/ 62 comments
Personal Information Exploit on OpenAI’s ChatGPT Raise Privacy Concerns - The New York Times https://www.nytimes.com/interactive/2023/12/22/technology/openai-chatgpt-privacy-exploit.html 4 comments
Mapping the Mind of a Large Language Model \ Anthropic https://www.anthropic.com/news/mapping-mind-language-model 2 comments
Mayor Eric Adams deepfakes himself https://aipoliticalpulse.substack.com/p/mayor-eric-adams-deepfakes-himself 1 comment
89% of Workers Use AI–Far Fewer Understand the Risks https://www.kolide.com/blog/89-of-workers-use-ai-far-fewer-understand-the-risks 1 comment
Mapping the Mind of a Large Language Model \ Anthropic https://www.anthropic.com/research/mapping-mind-language-model 1 comment
Model alignment protects against accidental harms, not intentional ones https://www.aisnakeoil.com/p/model-alignment-protects-against 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2310.03693] Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

See how to search.

Submit link to: