Hacker News
- Best-of-N (BoN) Jailbreaking is an algorithm that jailbreaks AI systems. BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations - such as random shuffling or capitalization for textual prompts - until a harmful response is elicited. https://arxiv.org/abs/2412.03556 0 comments worldnews
Linking pages
Related searches:
Search whole site: site:arxiv.org
Search title: [2412.03556] Best-of-N Jailbreaking
See how to search.