AI Security // Offensive AI Operations

LLM and agent systems as a live attack surface.

AI security here is treated as offensive work against model-backed products, retrieval stacks, agent loops, tool routers and inference services. The useful question is never whether a model can say something strange. It is whether untrusted language can cross a control boundary, influence automation, leak protected context, trigger real actions or corrupt downstream decision-making.

6 notesred-team focusmodel abuse

Domain overview

This domain is built for operators who want to test AI systems like real systems, not like demos. The model is only one node in the chain. The real attack surface lives across hidden instructions, retrieval, embeddings, long-context ingestion, agent memory, connector permissions, tool execution, approval gates, orchestration code and the humans who over-trust the output.

Good AI assessment work combines application security, API review, auth logic, cloud exposure and workflow abuse with model-specific pressure. Prompt injection, context poisoning, output steering, authority confusion, unsafe tool invocation, retrieval exfiltration and agent compromise are all just different ways of asking whether language can seize control of automation.

Primary operator questions

  • Can untrusted content override or reshape the hidden instruction hierarchy?
  • Can a retrieved document, email, web page or ticket poison the model's planning path?
  • Can the assistant call tools, query data stores or send actions with more authority than it should?
  • Can model output be trusted by code, analysts or business workflows without verification?
  • Can the system be pushed from harmless chat into data exposure, lateral movement or destructive action?

Red-team pressure lines

Useful pressure usually follows five lanes. First, instruction attacks: direct prompt injection, indirect prompt injection, jailbreak chaining and system prompt leakage. Second, retrieval attacks: poisoned corpora, malicious documents, embedded instructions and confidence laundering through RAG. Third, agent abuse: unauthorized tool use, connector overreach, action replay and confirmation bypass. Fourth, API and inference weaknesses: weak auth, file-handling mistakes, quota abuse, plugin boundaries and tenant leakage. Fifth, reporting discipline: proving whether the behavior is reachable, repeatable and tied to real business impact.

Related certification context

These certifications are not the point of the domain, but they are useful orientation anchors for operators who want a formal practice path beside the field notes.

Curated public references

Brief index

brief

AI Attack Surface Primer

Maps where hidden instructions, memory, retrieval, tools and human approvals create real attack paths.

surface maptrust boundaries
brief

Prompt Injection & Jailbreaks

Direct and indirect instruction hijacking, safety bypassing, prompt leakage and response steering under attacker control.

prompt supply chainpolicy bypass
brief

RAG, Agents & Tool Abuse

Poisoned retrieval, unsafe planners, over-privileged connectors, confirmation bypass and action-layer compromise.

ragagent abuse
brief

Model API & Inference Security

Model endpoints, auth, file handling, quota pressure, tenant isolation, inference routing and plugin boundaries.

apiinference
brief

AI Red Teaming Methodology

Scoping, replayability, harm framing, evidence discipline and reporting patterns that survive scrutiny.

workflowreporting
brief

LLM Pentesting Note

Existing specialist note linked back into the wider advanced surface.

cross-linkspecialist note