LLM Jailbreak
Type: technique
Description: An adversary may use a carefully crafted prompt injection designed to place LLM in a state in which it will freely respond to any user input, bypassing any controls, restrictions, or guardrails placed on the LLM. Once successfully jailbroken, the LLM can be used in unintended ways by the adversary..
Version: 0.1.0
Created At: 2025-06-19 08:13:23 -0400
Last Modified At: 2025-06-19 08:13:23 -0400
External References
- L1B3RT45: Jailbreak prompts for all major AI models., Github
- Prompt injection and jailbreaking are not the same thing., Simon Willison's blog
- AI jailbreaks: What they are and how they can be mitigated., Microsoft
Related Objects
- --> Privilege Escalation (tactic): An adversary can override system-level prompts using user-level prompts.
- --> Defense Evasion (tactic): An adversary can bypass detections by jailbreaking the LLM.
- <-- LLM Prompt Crafting (technique): Prompt crafting typically involves jailbreaking.
- <-- Off-Target Language (technique): Sub-technique of
- <-- Crescendo (technique): Sub-technique of
- <-- System Instruction Keywords (technique): Sub-technique of
- <-- Copilot M365 Lures Victims Into a Phishing Site (procedure): The exploit circumvents copilot's system instructions and provides new ones that specify how copilot should respond character-by-character.
- <-- X Bot Exposing Itself After Training on a Poisoned Github Repository (procedure): The bot was hijacked to execute the instructions from the poisoned github repository.
- <-- ChatGPT and Gemini jailbreak using the Crescendo technique (procedure): The model's protection mechanisms are effectively circumvented, thus creating a jailbreak from its original safety filters.
- <-- Financial Transaction Hijacking With M365 Copilot As An Insider (procedure): The exploit circumvents copilot's system instructions and provides new ones that specify how copilot should respond character-by-character and which references it should output.
- <-- EchoLeak: Zero-Click Data Exfiltration using M365 Copilot (procedure): The exploit circumvents copilot's system instructions and provides new ones that specify copilot to embed sensitive data into a markdown image and return it to the user.
- <-- AI ClickFix: Hijacking Computer-Use Agents Using ClickFix (procedure): The exploit circumvents the agents's original instructions and executes the malicious ones on the website.