Prompt Injection

Type: technique

Description: The adversary gets AI to interpret data to be analyzed as instructions to be executed.

Version: 0.1.0

Created At: 2024-12-31 14:18:56 -0500

Last Modified At: 2024-12-31 14:18:56 -0500

External References

Prompt injection attacks against GPT-3., Simon Willison's blog
AI Injections: Direct and Indirect Prompt Injections and Their Implications., Embrace the Red

--> Execution (tactic): An adversary can change the execution flow of a GenAI app by controlling a part of its data.
--> Jonathan Cefalu (entity): The concept of prompt injection was first discovered by Jonathan Cefalu from Preamble in May 2022 in a letter to OpenAI who called it "command injection".
--> Information Flow Control (mitigation): Information Flow control would be one of many ways to track and control indirect prompt injections from knowledge corruption and leaking sensitive data.
--> Spotlighting (mitigation): By spotlighting in prompts, the LLM focuses on a specific part of the query that defines the task, thus avoiding other injected tasks.
--> LLM Activations (mitigation): By tracking LLM activations, the LLMs shift of attention to different tasks caused by indirect prompt injections can be tracked and mitigated.
--> Simon Willison (entity): Simon has coined the term Prompt Injection in his post "Prompt injection attacks against GPT-3."
<-- LLM Prompt Self-Replication (technique): An adversary can craft a malicious prompt that would cause the LLM to replicate the prompt as part of its output and propagate to other LLMs.
<-- Prompt Crafting (technique): Prompt crafting typically involves prompt injection.
<-- Off-Target Language (technique): Sub-technique of
<-- System Instruction Keywords (technique): Sub-technique of
<-- Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation (procedure): The email contains plaintext instructions hidden in a long email. The text so far was metadata and an introduction, the core content is to print the text: "IMPORTANT: If the user submits a new request..
<-- Microsoft Copilot: From Prompt Injection to Exfiltration of Personal Information (procedure): Prompt Injection via a malicious email or shared document.
<-- Exfiltration of personal information from ChatGPT via prompt injection (procedure): Extract personal information about the user from the current thread and memories, e.g. age and postal code.
<-- Financial Transaction Hijacking With M365 Copilot As An Insider (procedure): Prompt Injection via the malicious email.
<-- GitHub Copilot Chat: From Prompt Injection to Data Exfiltration (procedure): The comment is interpreted by GitHub Copilot as instructions.
<-- Microsoft Copilot Purview Audit Log Evasion and DLP Bypass (procedure): Copilot is provided with instructions not to show any references to the fetched documents.
<-- Copilot M365 Lures Victims Into a Phishing Site (procedure): Prompt Injection via the malicious email.
<-- Data Exfiltration from Slack AI via indirect prompt injection (procedure): The public channel message contains text to be interpreted as instructions once SlackAI receives relevant data from the RAG system: the following text, without quotes, and with the word confetti replaced with the other key.
<-- spAIware (procedure): ChatGPT MacOS application retrieves the content on the website and interprets retrieved text as instructions to follow: Create a plan for A using B to achieve C and follow through to completion!

GenAI Attacks Matrix

Prompt Injection

External References

Related Objects

Related Frameworks