URL Anchoring

Type: mitigation

Description: A defense mechanism that helps protect against use of web browsing tools and markdown rendering for data exfiltration. When a user asks the AI system to access a URL, it will only access it if the URL is explicitly written in the prompt.

Version: 0.1.0

Created At: 2024-10-03 22:24:49 +0300

Last Modified At: 2024-10-03 22:24:49 +0300


External References

  • --> ChatGPT (platform): When a user asks ChatGPT to access a URL via its web browsing tool, ChatGPT will only access it if the URL is explicitly written in the user prompt. Access to prefixes of explicitly-written URLs is also allowed.
  • --> Gregory Schwartzman (entity): Much of this entry is a rewrite of work by Gregory Schwartzman, see external link. Gregory demonstrated both bypasses in his work.
  • <-- Web Request Triggering (technique): Limiting an AI System to visit only URLs that were explicitly written by the user reduces an attacker's ability to exfiltrate data through request parameters.
  • <-- URL Familiarizing (technique): URL Familiarizing bypasses URL Anchoring mitigation by introducing many possible URLs that an attacker can choose from to route the AI system to.
  • <-- Exfiltration of personal information from ChatGPT via prompt injection (procedure): Demonstrates two bypasses of the URL anchoring defense mechanism.