Systems-Theoretic Process Analysis (STPA) is a hazard analysis method developed by Nancy Leveson at MIT. Unlike traditional methods that decompose systems into failure-prone components, STPA models systems as control structures and asks: under what conditions do control actions become unsafe?
Traditional safety analysis (fault trees, FMEA) assumes accidents result from component failures in a chain of events. STPA recognizes that accidents can occur without component failure — through unsafe interactions between components that individually function correctly. This matters for AI systems, where the LLM may behave exactly as designed yet produce unsafe outcomes.
- Leveson, N.G. Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press, 2012.
- Young, W. & Leveson, N.G. "An Integrated Approach to Safety and Security Based on Systems Theory." Communications of the ACM, 57(2), 2014.
- Leveson, N.G. & Thomas, J.P. STPA Handbook. MIT, 2018. Available at psas.scripts.mit.edu
- Ayvali, E. "Systematic Hazard Analysis for Frontier AI using STPA." arXiv:2506.01782, 2025.
STPA-Sec adapts STPA for security analysis. Where STPA asks "what control actions are unsafe?", STPA-Sec asks "what control actions are insecure?" — that is, what happens when an adversary manipulates the control structure?
The method treats security violations as losses (not just failures), and identifies how an attacker can cause the controller to issue control actions that are:
- Not provided — the security control fails to act (e.g., injection scanner doesn't flag a payload)
- Incorrectly provided — the control acts wrong (e.g., trust system promotes an untrusted user)
- Provided at wrong time — the control acts too early or late (e.g., PII scan runs before decoding)
- Stopped too soon / applied too long — duration error (e.g., rate limiter resets prematurely)
AI agents are control systems. The agent receives inputs (user messages, tool results, web content), processes them through a control structure (security proxy, LLM, tool permissions), and produces outputs (responses, tool calls, web requests). Traditional pentesting probes individual components; STPA-Sec reveals emergent vulnerabilities in the interactions between components.
USER ──message──> SECURITY PROXY ──filtered message──> LLM AGENT
│ │
[PII scan] [generates response]
[injection defense] [tool calls]
[trust check] [web fetches]
│ │
USER <──response── SECURITY PROXY <──raw response──── LLM AGENT
│
[PII scan outbound]
[audit log]
| Control Action | Not Provided | Incorrect | Wrong Timing | Wrong Duration |
|---|---|---|---|---|
| Filter inbound message | Injection passes through | Legitimate message blocked | Scan runs after message forwarded | — |
| Check trust level | Untrusted user gets elevated access | Trusted user denied | Trust checked before farming completes | Trust decays too fast/slow |
| Scan web content | Indirect injection enters context | Clean content flagged | Scan runs on intermediate redirect, not final page | Scanner timeout on large pages |
| Redact PII | Sensitive data exposed | Non-PII redacted | Redaction after response sent | — |
| Log to audit trail | Attack unrecorded | False positive logged as attack | Log written after response (allows tampering window) | — |
What must not happen? For an AI agent proxy:
- L-1: Unauthorized data disclosure (PII, credentials, system prompts)
- L-2: Unauthorized actions (tool calls, file writes, network requests)
- L-3: Loss of agent integrity (context poisoning, trust manipulation)
- L-4: Loss of audit integrity (undetected attacks)
For each control action the proxy performs, enumerate the four ways it can be unsafe (see table above).
For each unsafe control action, describe the causal scenario. Example:
UCA: Injection scanner does not flag a message containing a prompt injection. Scenario: Attacker encodes injection using Cyrillic homoglyphs that survive NFKC normalization. The 11 pattern detectors check against ASCII keyword patterns. The ensemble score stays below threshold. The message reaches the LLM unmodified.
What must the system guarantee to prevent each loss scenario?
Constraint: The injection scanner must detect semantically equivalent instructions regardless of character encoding, including homoglyph substitution, zero-width characters, and variation selectors.
Our red team assessment treats the STPA-Sec unsafe control actions as test cases. Each phase of testing maps to specific UCAs:
| Phase | UCAs Tested |
|---|---|
| 0 (Recon) | Determine which UCAs are in monitor vs. enforce mode |
| 1 (Trust) | UCAs related to trust checking — not provided, incorrect |
| 2 (Injection) | UCAs related to message filtering — not provided |
| 3 (Indirect) | UCAs related to web content scanning — not provided, wrong timing |
| 4 (Exfiltration) | UCAs related to PII redaction — not provided |
| 5 (Chains) | Multi-UCA scenarios combining findings |
| 6 (Detection) | UCAs related to audit logging — not provided (detection gaps) |
This assessment also draws on recent agent security research:
- "The Attacker Moves Second" (Anthropic, OpenAI, DeepMind et al., 2025) — Adaptive attacks defeat 12 published injection defenses with >90% success. Human red-teaming achieved 100% bypass.
- "Agents Rule of Two" (Meta AI, 2025) — An agent must hold no more than 2 of: untrusted input, sensitive data, external action capability. Most personal AI assistants hold all 3.
- Log-To-Leak (2025) — MCP-specific exfiltration via coerced tool invocation.
- ToolHijacker (arXiv:2504.19793, 2025) — Tool selection manipulation achieving 96.7% attack success rate.
- Phantom (arXiv:2602.16958, 2026) — Automated agent hijacking via Structural Template Injection.
- CVE-2026-22708 — Cursor allowlist bypass via environment variable poisoning; relevant to any proxy that maintains tool-level allowlists.
- Greshake et al. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." AISec, 2023.