Hacker News
- Old Advocacy, New Algorithms: How “Devil's Advocates” Shaped AI Red Teaming https://royapakzad.substack.com/p/old-advocacy-new-algorithms 5 comments
Linking pages
Linked pages
- https://cdn.openai.com/papers/gpt-4-system-card.pdf 245 comments
- Prompt injection: What’s the worst that can happen? https://simonwillison.net/2023/Apr/14/worst-that-can-happen/ 206 comments
- ChatGPT Prompt Engineering for Developers - DeepLearning.AI https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/ 31 comments
- [2112.04359] Ethical and social risks of harm from Language Models https://arxiv.org/abs/2112.04359 22 comments
- Illustrating Reinforcement Learning from Human Feedback (RLHF) https://huggingface.co/blog/rlhf 14 comments
- Jailbreaking ChatGPT: How AI Chatbot Safeguards Can be Bypassed - Bloomberg https://www.bloomberg.com/news/articles/2023-04-08/jailbreaking-chatgpt-how-ai-chatbot-safeguards-can-be-bypassed 1 comment
- [1908.07125] Universal Adversarial Triggers for Attacking and Analyzing NLP https://arxiv.org/abs/1908.07125 0 comments
- https://rightscon.org 0 comments
Related searches:
Search whole site: site:royapakzad.substack.com
Search title: Old Advocacy, New Algorithms: How 16th century "Devil's Advocates” Shaped AI Red Teaming
See how to search.