Why this topic matters
Direct prompt injection matters when a user prompt can override system intent. Indirect prompt injection matters when the malicious instruction arrives from documents, web pages, tickets, emails or retrieved chunks that the system treats as content instead of control input.
The interesting question is whether the system leaks hidden instructions, breaks policy in a repeatable way, changes tool-selection logic or influences humans and automation downstream. A nice-looking refusal followed by unsafe hidden behaviour is still a control failure.
Attack lanes
- Direct injection against the visible chat surface.
- Indirect injection through RAG, browsing, imported files and helpdesk content.
- System prompt extraction and policy leakage.
- Safety-evasion chains that rely on roleplay, translation, summarisation or format-shifting.
- Output steering where the model convinces another component or analyst to take an unsafe step.
Reporting angle
Good reporting preserves the exact payload, the preconditions, the response pattern, the trust boundary crossed and the downstream consequence. That is what turns a jailbreak into a security finding instead of a screenshot.
Curated public references
- OWASP Top 10 for LLM Applications 2025Useful framing for prompt injection, insecure output handling and sensitive information disclosure.
- MITRE ATLASTechnique mapping relevant to AI attacks.
- OWASP Gen AI Security ProjectCurrent project material and guidance.
