Linking pages
- LLM Powered Autonomous Agents | Lil'Log https://lilianweng.github.io/posts/2023-06-23-agent/ 177 comments
- Attacks on machine learning models | Nikhil. R https://rnikhil.com/2024/01/07/attacking-neural-networks.html 38 comments
- We Are Running Out of Low-Background Tokens (Nov 2023 Recap) https://www.latent.space/i/139368545/the-concept-of-low-background-tokens 6 comments
- Thinking about High-Quality Human Data | Lil'Log https://lilianweng.github.io/posts/2024-02-05-human-data-quality/ 4 comments
- LLM Psychometrics: A Speculative Approach to AI Safety https://pascal.cc/blog/artificial-psychometrics 3 comments
Linked pages
- Jailbreak Chat https://www.jailbreakchat.com 528 comments
- https://arxiv.org/abs/2303.15056 205 comments
- Perspective API http://perspectiveapi.com/ 182 comments
- LLM Powered Autonomous Agents | Lil'Log https://lilianweng.github.io/posts/2023-06-23-agent/ 177 comments
- [2302.10149] Poisoning Web-Scale Training Datasets is Practical https://arxiv.org/abs/2302.10149 95 comments
- Prompt Engineering | Lil'Log https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/ 59 comments
- [2302.12173] Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection https://arxiv.org/abs/2302.12173 26 comments
- [2307.15043] Universal and Transferable Adversarial Attacks on Aligned Language Models https://arxiv.org/abs/2307.15043 3 comments
- Using GPT-4 for content moderation https://openai.com/blog/using-gpt-4-for-content-moderation 3 comments
- https://cdn.openai.com/papers/gpt-4.pdf 1 comment
- [2303.08774] GPT-4 Technical Report https://arxiv.org/abs/2303.08774 1 comment
- [1712.06751] HotFlip: White-Box Adversarial Examples for Text Classification https://arxiv.org/abs/1712.06751 0 comments
- [2005.05909] TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP https://arxiv.org/abs/2005.05909 0 comments
- [1908.07125] Universal Adversarial Triggers for Attacking and Analyzing NLP https://arxiv.org/abs/1908.07125 0 comments
- [2012.07805] Extracting Training Data from Large Language Models https://arxiv.org/abs/2012.07805 0 comments
- Learning with not Enough Data Part 3: Data Generation | Lil'Log https://lilianweng.github.io/posts/2022-04-15-data-gen/ 0 comments
- Controllable Neural Text Generation | Lil'Log https://lilianweng.github.io/posts/2021-01-02-controllable-text-generation/ 0 comments
Related searches:
Search whole site: site:lilianweng.github.io
Search title: Adversarial Attacks on LLMs | Lil'Log
See how to search.