- [R] Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google http://far.ai/post/2025-02-r1-redteaming/ 9 comments machinelearning
Linked pages
- Character.ai Faces Lawsuit After Teen’s Suicide - The New York Times https://www.nytimes.com/2024/10/23/technology/characterai-lawsuit-teen-suicide.html 504 comments
- https://openai.com/index/disrupting-deceptive-uses-of-AI-by-covert-influence-operations/ 71 comments
- [2309.07864] The Rise and Potential of Large Language Model Based Agents: A Survey https://arxiv.org/abs/2309.07864 1 comment
- [2404.12699] SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models https://arxiv.org/abs/2404.12699 1 comment
- Risk compensation - Wikipedia https://en.wikipedia.org/wiki/Risk_compensation 0 comments
- Stampy https://aisafety.info/ 0 comments
- [2408.02946] Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws https://arxiv.org/abs/2408.02946 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:far.ai
Search title: Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google | FAR.AI
See how to search.