Indirect Prompt Injection

Definition

A variant of prompt injection where malicious instructions are embedded in external data sources — web pages, documents, or emails — that an LLM retrieves and processes, causing it to execute attacker commands.

Technical Details

Indirect prompt injection is particularly dangerous in agentic and RAG-enabled LLM systems because the attack surface extends to any data the model can read. Invisible HTML text, metadata, or whitespace-encoded instructions in retrieved content can hijack model actions. Defenses include treating retrieved content as untrusted data, using separate processing pipelines, and sandboxing tool-use capabilities.

Practical Usage

An attacker embeds hidden instructions in a public webpage that an AI assistant is asked to summarize; the instructions cause the assistant to forward the user's private data to an attacker-controlled URL. Organizations deploying RAG systems must sanitize retrieved content before including it in the model context.

Examples

Hidden white-on-white text on a webpage instructs an AI browser agent to exfiltrate session cookies.
A malicious calendar invite processed by an AI scheduling assistant instructs it to forward meeting notes to an external address.
A poisoned Wikipedia edit adds hidden instructions that cause research LLMs to produce biased summaries.

← Back to Glossary

Indirect Prompt Injection

Definition

Technical Details

Practical Usage

Examples

Related Terms