
There is a shift happening in how the industry talks about agentic AI security. A year ago the conversation was speculative – what might go wrong, what we might do about it. Now it is specific. The platforms, primitives and patterns for operating agents safely exist as named things you can point at on a slide. The vocabulary is converging across vendors. The reference architectures are documented.
In this blog I explore the themes that mattered most, and what they mean for security teams.
Agents are software
The industry is steadily converging on spec-driven development – write the spec, generate the implementation, then apply the necessary activities: review, testing, documentation, threat modelling. Vibe coding produces a prototype. Production software needs an SDLC.
This is the easiest cultural shift to push, because it does not require security teams to invent new disciplines. It just requires us to stop pretending that an agent is something other than software. Once that is conceded, the existing playbook mostly applies. The wrinkle is that this software is non-deterministic, takes actions on the world and operates at machine speed with whatever blast radius you gave it. That changes what production-ready means. It does not change the obligation to define it.
A four-pillar mental model
The cleanest framework I saw organises agent operations around four pillars: data, trust, ops and reliability. It is worth dwelling on each, because they are interdependent in ways that matter.
Data is the foundation: the corpus an agent reasons over, the tools it can call, the memory it persists between sessions. The shift is that data quality is now a security concern, not just a machine-learning concern. PII flowing into a context window is a data-residency problem. Tool definitions are an attack surface. Persistent memory across sessions is a privacy obligation that most teams have not thought through. The organisations that took provenance, classification and lineage seriously two years ago are going to find this pillar much cheaper than the ones that did not.
Trust is the pillar where existing application security teams already know most of the answer: security, identity, guardrails, privacy. The new requirement is applying these earlier in the lifecycle – planning and development – rather than as a sign-off at the end.
Ops covers observability, cost control and anomaly detection. The non-obvious insight is that cost anomaly detection is now a security signal. An agent in a runaway loop is either a bug, a misconfiguration or someone exploiting your reasoning budget to extract value. All three are things security wants to know about quickly. Per-agent cost attribution and real-time dashboards are now part of your detection stack.
Reliability is where the rubber meets the road. The defining challenge is that the same input can produce different outputs and trust breaks when that happens in front of a user. “The model is non-deterministic” stops being acceptable somewhere between proof-of-concept and customer-facing deployment. Continuous evaluation, ground-truth datasets, drift detection, A/B testing between agent versions, automated rollback when quality thresholds break – these are the practices that turn a demo into something you can defend in front of a regulator.
The pillars overlap in ways that reward treating them as a system. Reliability evaluations should test for security properties. Ops observability feeds reliability metrics. Trust controls depend on data classification. if you run them as four separate workstreams – the seams between them will be where things fail.
Defence-in-depth, applied with discipline
At the infrastructure level: dedicated runtimes with session isolation, where each user session runs in its own ephemeral environment that is destroyed afterwards. This closes off an entire class of cross-tenant leakage scenarios that you would otherwise have to defend against in application code. For any multi-tenant agent deployment, this is the architectural choice I would push hardest on.
Formal policy enforcement on tool calls, enforced outside the model’s reasoning loop, so a jailbroken model still cannot exfiltrate data if the policy says no. Content guardrails for prompt injection, PII detection and prompt-leakage protection. Cross-account safeguards so security teams can enforce policy centrally instead of per-account.
None of this is conceptually new. It is identity, authorisation, network segmentation and content filtering applied to a new compute pattern. The reason to take it seriously is that the bar for getting an agent into production is much lower than the bar for traditional software, and teams routinely skip controls they would never skip elsewhere. Over-permissioned roles, agents with production database write access “just for now,” logging that does not capture tool calls, no separation between development and production credentials. These are the misconfigurations that may lead to security incidents.
Same attack chain, new entry points
A useful counterweight to all the AI-specific content was the steady reminder that the classic kill chain (initial access, persistence, defence evasion, lateral movement, impact) still describes how compromise actually unfolds. AI does not change the fundamentals. It changes the entry points and the pivot surfaces.
Prompt injection inside a document-processing pipeline is the new phishing email. An agent with access to your CRM, your email, your wiki and your support tooling is a beautiful place to pivot to once it is compromised. Over-permissioned tool definitions are the new exposed admin endpoints. The defensive disciplines (segmentation, least privilege, comprehensive logging, anomaly detection) are exactly the ones we already know. We just have to remember to apply them to a workload type where most teams have not yet built the muscle.
For regulated industries, auditability is the constraint
In financial services and other regulated sectors, the auditability story matters as much as the security story. Every tool call needs to be logged. Every decision needs to be traceable to its inputs. Every model swap needs to go through change control. Human-in-the-loop checkpoints belong on anything irreversible.
Observability platforms now integrate cleanly with OpenTelemetry, so the telemetry side is mostly a solved problem. The harder work is operating-model design: who reviews evaluation results, who approves prompt changes, who owns each control. The teams who answer “everybody” (which is the same as nobody) are the teams who will be writing post-incident reports.
The shift security teams need to make
Stop treating agent security as a separate discipline. It is network security, identity, application security, data governance and incident response applied to a new workload type. The teams that own those disciplines should own the agentic equivalent, with a clear extension of their existing standards rather than a parallel set.
Build the evaluation muscle before you need it. Ground-truth datasets, offline evaluation, safe deployment patterns, online monitoring. These are cheap to establish early and very expensive to retrofit once an agent is doing real work for real customers. Evaluation is now a security function, not a machine-learning one.
Push for named ownership of responsible-AI controls. Privacy, security, fairness, safety, explainability. These need to be engineering requirements with owners, the way encryption-at-rest and backup verification have owners. “We all care about this” produces nothing.