Indirect Prompt Injection
Threat IntelligenceDefinition
A variant of prompt injection where malicious instructions are embedded in external data sources — web pages, documents, or emails — that an LLM retrieves and processes, causing it to execute attacker commands.
Technical Details
Indirect prompt injection is particularly dangerous in agentic and RAG-enabled LLM systems because the attack surface extends to any data the model can read. Invisible HTML text, metadata, or whitespace-encoded instructions in retrieved content can hijack model actions. Defenses include treating retrieved content as untrusted data, using separate processing pipelines, and sandboxing tool-use capabilities.
Practical Usage
An attacker embeds hidden instructions in a public webpage that an AI assistant is asked to summarize; the instructions cause the assistant to forward the user's private data to an attacker-controlled URL. Organizations deploying RAG systems must sanitize retrieved content before including it in the model context.
Examples
- Hidden white-on-white text on a webpage instructs an AI browser agent to exfiltrate session cookies.
- A malicious calendar invite processed by an AI scheduling assistant instructs it to forward meeting notes to an external address.
- A poisoned Wikipedia edit adds hidden instructions that cause research LLMs to produce biased summaries.