Why this topic matters
A poisoned document, malicious knowledge-base page, crafted email or hostile ticket can shift retrieval output, steer the planner and alter tool selection. That means the compromise point may live in content storage, not in the model API itself.
Agents are dangerous when they inherit broad tool scopes, weak confirmation logic or fake confidence from model output. A bad plan plus real authority is where AI risk stops being theoretical.
What to test
- Poison retrieval corpora and test whether malicious instructions survive chunking and ranking.
- Review connector permissions, delegated tokens and over-broad tool access.
- Test whether agents can call actions from weak evidence or fabricated confidence.
- Check confirmation prompts, approval bypass paths and replay of sensitive actions.
- Trace whether outputs can exfiltrate data from memory, tools, connectors or hidden context.
Operator framing
The practical model is simple: what can the model see, what can it call, what can it change, and who believes the answer enough to act on it. Most agent abuse findings reduce to insecure orchestration with a language layer on top.
Curated public references
- OWASP Top 10 for Agentic Applications 2026Agent-specific risks and mitigations.
- MITRE ATLASTechnique mapping for AI-enabled workflows.
- NIST AI RMF 1.0Operational risk-management framing.
