Linking pages
Linked pages
- All the Hard Stuff Nobody Talks About when Building Products with LLMs | Honeycomb https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm 126 comments
- https://dl.acm.org/doi/pdf/10.1145/3442188.3445922 122 comments
- [2302.10149] Poisoning Web-Scale Training Datasets is Practical https://arxiv.org/abs/2302.10149 95 comments
- FakeToxicityPrompts: Automatic Red Teaming https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red 55 comments
- [1609.02943] Stealing Machine Learning Models via Prediction APIs https://arxiv.org/abs/1609.02943 37 comments
- [1905.02175] Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/abs/1905.02175 28 comments
- [2302.12173] Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection https://arxiv.org/abs/2302.12173 26 comments
- Edition 21: A framework to securely use LLMs in companies - Part 1: Overview of Risks https://boringappsec.substack.com/p/edition-21-a-framework-to-securely 25 comments
- [2307.03718] Frontier AI Regulation: Managing Emerging Risks to Public Safety https://arxiv.org/abs/2307.03718 7 comments
- [2106.09898] Bad Characters: Imperceptible NLP Attacks https://arxiv.org/abs/2106.09898 6 comments
- Hacking Auto-GPT and escaping its docker container | Positive Security https://positive.security/blog/auto-gpt-rce 5 comments
- [2308.03825] "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models https://arxiv.org/abs/2308.03825 4 comments
- [2307.15008] A LLM Assisted Exploitation of AI-Guardian https://arxiv.org/abs/2307.15008 1 comment
- Securing LLM Systems Against Prompt Injection | NVIDIA Technical Blog https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/ 1 comment
- Secure your machine learning with Semgrep | Trail of Bits Blog https://blog.trailofbits.com/2022/10/03/semgrep-maching-learning-static-analysis/ 0 comments
- [2012.07805] Extracting Training Data from Large Language Models https://arxiv.org/abs/2012.07805 0 comments
- [2006.03463] Sponge Examples: Energy-Latency Attacks on Neural Networks https://arxiv.org/abs/2006.03463 0 comments
- GitHub - Trusted-AI/adversarial-robustness-toolbox: Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams https://github.com/Trusted-AI/adversarial-robustness-toolbox 0 comments
- [2305.10036] Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark https://arxiv.org/abs/2305.10036 0 comments
- [2205.12700] BITE: Textual Backdoor Attacks with Iterative Trigger Injection https://arxiv.org/abs/2205.12700 0 comments
Related searches:
Search whole site: site:llmsecurity.net
Search title: LLM Security
See how to search.