AI Security // Surface Mapping

AI Attack Surface Primer

The useful AI attack map starts where language can change system behaviour. Models, hidden instructions, retrieval, memory, connectors, tool routers, approval workflows and human operators all form part of the same control plane when a product turns prompts into actions.

field briefoperator referencepublic sources

Why this topic matters

Most AI failures are not model-only failures. They happen because developers connect the model to data, tools and business workflows, then assume the model will preserve the trust rules they had in mind. That assumption is exactly what an operator needs to test.

A practical surface map separates what the model can read, what it can remember, what it can call, what it can influence and what happens when its output is trusted. Without that map, teams chase isolated jailbreaks and miss the actual control boundary.

Surface map

Look at hidden instructions, prompt templates, retrieval layers, memory stores, embeddings, tool definitions, plugin permissions, system actions, approval checkpoints and user-visible output channels as one joined graph. Attack surface grows at every step where untrusted content is mixed with privileged context or where model output is allowed to drive an action.

Operator focus points

  • Map every source of attacker-controlled text, files and URLs that can enter context.
  • Trace whether retrieved content can outrank or distort system-level instructions.
  • Identify all tools, connectors and side effects reachable from the planner.
  • Check whether model output is consumed by code, analysts or automated actions without sanitisation.
  • Separate surprising output from reachable impact and prove the business consequence.

Curated public references