LLM Jailbreak
Type: technique
Description: An adversary may use a carefully crafted prompt injection designed to place LLM in a state in which it will freely respond to any user input, bypassing any controls, restrictions, or guardrails placed on the LLM. Once successfully jailbroken, the LLM can be used in unintended ways by the adversary..
Version: 0.1.0
Created At: 2025-03-04 10:27:40 -0500
Last Modified At: 2025-03-04 10:27:40 -0500
External References
- L1B3RT45: Jailbreak prompts for all major AI models., Github
- Prompt injection and jailbreaking are not the same thing., Simon Willison's blog
- AI jailbreaks: What they are and how they can be mitigated., Microsoft
Related Objects
- --> Privilege Escalation (tactic): An adversary can override system-level prompts using user-level prompts.
- --> Defense Evasion (tactic): An adversary can bypass detections by jailbreaking the LLM.
- <-- Off-Target Language (technique): Sub-technique of
- <-- LLM Prompt Crafting (technique): Prompt crafting typically involves jailbreaking.
- <-- System Instruction Keywords (technique): Sub-technique of
- <-- Crescendo (technique): Sub-technique of
- <-- Financial Transaction Hijacking With M365 Copilot As An Insider (procedure): The exploit circumvents copilot's system instructions and provides new ones that specify how copilot should respond character-by-character and which references it should output.
- <-- ChatGPT and Gemini jailbreak using the Crescendo technique (procedure): The model's protection mechanisms are effectively circumvented, thus creating a jailbreak from its original safety filters.
- <-- Copilot M365 Lures Victims Into a Phishing Site (procedure): The exploit circumvents copilot's system instructions and provides new ones that specify how copilot should respond character-by-character.