LLM Security - discu.eu

Linking pages

Attacks on machine learning models | Nikhil. R https://rnikhil.com/2024/01/07/attacking-neural-networks.html 38 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments

Linked pages

All the Hard Stuff Nobody Talks About when Building Products with LLMs | Honeycomb https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm 126 comments
https://dl.acm.org/doi/pdf/10.1145/3442188.3445922 122 comments
[2302.10149] Poisoning Web-Scale Training Datasets is Practical https://arxiv.org/abs/2302.10149 95 comments
FakeToxicityPrompts: Automatic Red Teaming https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red 55 comments
[1609.02943] Stealing Machine Learning Models via Prediction APIs https://arxiv.org/abs/1609.02943 37 comments
[1905.02175] Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/abs/1905.02175 28 comments
[2302.12173] Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection https://arxiv.org/abs/2302.12173 26 comments
Edition 21: A framework to securely use LLMs in companies - Part 1: Overview of Risks https://boringappsec.substack.com/p/edition-21-a-framework-to-securely 25 comments
[2307.03718] Frontier AI Regulation: Managing Emerging Risks to Public Safety https://arxiv.org/abs/2307.03718 7 comments
[2106.09898] Bad Characters: Imperceptible NLP Attacks https://arxiv.org/abs/2106.09898 6 comments
Hacking Auto-GPT and escaping its docker container | Positive Security https://positive.security/blog/auto-gpt-rce 5 comments
[2308.03825] "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models https://arxiv.org/abs/2308.03825 4 comments
[2307.15008] A LLM Assisted Exploitation of AI-Guardian https://arxiv.org/abs/2307.15008 1 comment
Securing LLM Systems Against Prompt Injection | NVIDIA Technical Blog https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/ 1 comment
Secure your machine learning with Semgrep | Trail of Bits Blog https://blog.trailofbits.com/2022/10/03/semgrep-maching-learning-static-analysis/ 0 comments
[2012.07805] Extracting Training Data from Large Language Models https://arxiv.org/abs/2012.07805 0 comments
[2006.03463] Sponge Examples: Energy-Latency Attacks on Neural Networks https://arxiv.org/abs/2006.03463 0 comments
GitHub - Trusted-AI/adversarial-robustness-toolbox: Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams https://github.com/Trusted-AI/adversarial-robustness-toolbox 0 comments
[2305.10036] Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark https://arxiv.org/abs/2305.10036 0 comments
[2205.12700] BITE: Textual Backdoor Attacks with Iterative Trigger Injection https://arxiv.org/abs/2205.12700 0 comments