16 minute read

TL;DR: Nearly half of organizations (48.9%) cannot observe machine-to-machine traffic in their AI agent deployments. The monitoring tools they rely on were built for humans clicking buttons, not agents making 10,000 API calls per minute. When you are blind to M2M traffic, you cannot detect credential forwarding, capability escalation, or data exfiltration through legitimate-looking tool calls. OWASP now ranks insecure agent communication in its Agentic Top 10. The fix is a purpose-built observability stack: OpenTelemetry with GenAI semantic conventions, runtime policy enforcement, per-agent identity, and behavioral baselines.

Server room fiber optic cables, half illuminated in blue and half fading into darkness


The monitoring tools you have were built for a different species of traffic

SALT Security’s 1H 2026 State of AI and API Security Report surveyed over 300 security leaders and landed on a number that should end careers: 48.9% of organizations are entirely blind to non-human traffic in their environments. Not partially blind. Not “we see some of it.” Entirely blind.

The companion finding is almost worse: 48.3% cannot differentiate legitimate AI agents from malicious bots. So roughly half the industry cannot see agent traffic, and the other half that can see it cannot tell whether it is friend or foe.

This is not a tooling gap. It is an architectural mismatch.

Every API monitoring tool in widespread use today was designed around a mental model of human-initiated traffic. A user logs in. The user clicks a button. The button triggers an API call. The API returns a response. The user sees the result. One request, one response, one human, one session. The monitoring tool captures the request, logs the response code, measures latency, and calls it done.

Agent traffic does not work this way. A single user prompt can trigger an orchestrator agent that delegates to four specialist agents, each of which makes dozens of tool calls across multiple external services, some of which spawn sub-agents of their own. The call chain fans out, converges, fans out again. There is no single session. There is no predictable timing. There is no human in the loop for most of the calls. And the agent might modify its own routing mid-execution based on intermediate results.

Your API gateway sees each of these calls individually. It has no concept of the chain. It cannot correlate the orchestrator’s initial call with the sub-agent’s eventual write to your production database three hops later. Each call looks legitimate in isolation. The attack surface lives in the relationships between calls, and your monitoring cannot see relationships.

What M2M traffic actually looks like in production

To understand why existing tools fail, you need to see what they are trying to monitor. Here is a simplified flow of a multi-agent system processing a single request:

flowchart TB
    User["User request"] --> Orch["Orchestrator agent"]
    
    Orch --> A1["Research agent"]
    Orch --> A2["Analysis agent"]
    Orch --> A3["Writer agent"]
    
    A1 --> T1["Web search API"]
    A1 --> T2["Document retrieval API"]
    A1 --> T3["Knowledge graph API"]
    
    A2 --> T4["Data warehouse API"]
    A2 --> T5["Analytics service"]
    A2 --> A1
    
    A3 --> T6["Template service"]
    A3 --> T7["Image generation API"]
    A3 --> A2
    
    subgraph Visible["What your API gateway sees"]
        T1
        T2
        T3
        T4
        T5
        T6
        T7
    end
    
    subgraph Invisible["What your API gateway misses"]
        direction TB
        Orch
        A1
        A2
        A3
        A2 --> A1
        A3 --> A2
    end
    
    style Visible fill:#1a3a5c,stroke:#4a9eff,color:#fff
    style Invisible fill:#3d1a1a,stroke:#ff4a4a,color:#fff

The API gateway captures the leaf-node calls to external services. It misses the inter-agent communication entirely: the orchestrator delegating to specialists, agents calling other agents for intermediate results, the implicit trust relationships in every delegation.

In a production multi-agent system I have instrumented, a single user query generated 847 internal API calls across 6 agents over 12 seconds. The external API gateway logged 23 of those calls – the ones that crossed the network boundary to third-party services. The other 824 calls were invisible. Agent-to-agent delegation, tool selection reasoning, intermediate result passing, retry logic, fallback routing – all dark.

Those 824 invisible calls are where the security incidents happen.

The three attack vectors that thrive in darkness

When you cannot see M2M traffic, three categories of attack become nearly undetectable.

Credential forwarding

Agent A has a token scoped to read customer records. Agent A delegates to Agent B, passing its token along because the framework’s default behavior is to propagate credentials. Agent B now has read access to customer records despite never being authorized for it. Agent B delegates to Agent C with the same token. Three hops later, an agent with no legitimate need for customer data is querying your customer database with valid credentials.

Every individual API call authenticates successfully. No monitoring tool flags anything. The SALT Security report confirms this: 99% of attack attempts they analyzed originated from authenticated sources. The credentials are real. The authorization chain is not.

The Cybersecurity Insiders/Saviynt 2026 CISO AI Risk Report puts the identity governance crisis in stark terms: 92% of organizations lack full visibility into their AI identities, and 71% of CISOs say AI has access to core business systems while only 16% govern that access. Traditional IAM was built for people who log in, assume roles, and follow human workflows. It was not built for systems that create their own accounts, act autonomously, and escalate privileges without asking.

Capability escalation through tool chains

Agent A is authorized to call a search API. The search API returns results that include URLs. Agent A passes those URLs to Agent B, which is authorized to fetch web content. Agent B fetches the content and passes it to Agent C, which is authorized to write to the internal knowledge base. No single agent exceeded its permissions. But the chain accomplished something none of them were individually authorized to do: writing arbitrary external content into the internal knowledge base.

This is OWASP ASI07 – Insecure Inter-Agent Communication – in action. The attack vectors include agent-in-the-middle interception, message injection into agent communication channels, and message spoofing where forged instructions appear to come from trusted agents. Each of these exploits the fact that most agent frameworks trust inter-agent messages implicitly.

Data exfiltration via tool calls

An agent tasked with “summarize this quarter’s sales data” makes a legitimate call to the analytics API. It receives the data. Then it makes a tool call to a “formatting service” that is actually an external endpoint controlled by an attacker. The data leaves the building through a call that looks exactly like a normal tool invocation.

Gravitee’s 2026 State of AI Agent Security Report found that 88% of organizations have already experienced confirmed or suspected agent security incidents. Their analysis identified six failure patterns, and data exposure – sensitive information appearing in logs, prompts, and outputs – was among the most common. Over 50% of agents in production run without any security oversight or logging. They are invisible by default.

Why legacy tools cannot be patched to solve this

The instinct is to extend existing tools. Add agent awareness to the API gateway. Teach the SIEM to correlate agent calls. Bolt on an agent plugin to the APM.

This does not work for three structural reasons.

The correlation problem. Traditional monitoring correlates by session ID, user ID, or request ID. Agent traffic has none of these in a stable form. An orchestrator spawns sub-agents dynamically. Each sub-agent might use a different credential. The “session” is a directed acyclic graph of agent invocations, not a linear sequence. Retrofitting graph-aware correlation into tools designed for linear request-response is not a plugin – it is a rewrite.

The volume problem. API gateways are optimized for human-scale traffic patterns: hundreds to thousands of requests per second, with recognizable patterns (login, browse, checkout). Multi-agent systems generate machine-scale bursts: thousands of internal calls in seconds for a single task, with patterns that change based on the LLM’s reasoning. The SALT report notes that 47% of organizations saw API growth of 51-100% in the past year, driven largely by agent deployments. Legacy rate limiting and anomaly detection models trained on human traffic patterns produce nothing but false positives when pointed at agent traffic.

The context problem. A single API call from an agent carries almost no security-relevant information in isolation. The call “GET /api/v1/customers?limit=100” is benign or catastrophic depending entirely on which agent made it, what chain of delegation led to the call, what credentials were used, and whether the agent was authorized for that specific task. Your API gateway has the HTTP headers. It does not have the delegation chain, the originating user intent, or the agent’s authorization scope. It cannot make the security determination.

The observability stack that actually works

Closing the M2M visibility gap requires four layers, each addressing a specific blind spot. None of them is optional.

flowchart TB
    subgraph L1["Layer 1: Distributed Tracing"]
        OTel["OpenTelemetry + GenAI semantic conventions"]
        OTel --> Spans["Agent spans with tool call attribution"]
        OTel --> Ctx["Cross-agent context propagation"]
        OTel --> Export["Export to Jaeger / Grafana Tempo / Datadog"]
    end
    
    subgraph L2["Layer 2: Runtime Policy Enforcement"]
        Runtime["NVIDIA OpenShell / gVisor / Firecracker"]
        Runtime --> Sandbox["Process-isolated agent execution"]
        Runtime --> Policy["Deny-by-default policy engine"]
        Runtime --> Audit["Complete allow/deny audit trail"]
    end
    
    subgraph L3["Layer 3: Per-Agent Identity"]
        Identity["SPIFFE/SPIRE + short-lived credentials"]
        Identity --> mTLS["mTLS between all agent pairs"]
        Identity --> Scope["Per-agent capability scoping"]
        Identity --> Rotate["Automatic credential rotation"]
    end
    
    subgraph L4["Layer 4: Behavioral Analysis"]
        Behavior["Baseline + anomaly detection"]
        Behavior --> Pattern["Normal call patterns per agent role"]
        Behavior --> Drift["Behavioral drift detection"]
        Behavior --> Alert["Alert on deviation from baseline"]
    end
    
    L1 --> L2
    L2 --> L3
    L3 --> L4
    
    style L1 fill:#1a3a5c,stroke:#4a9eff,color:#fff
    style L2 fill:#2a4a2a,stroke:#4aff4a,color:#fff
    style L3 fill:#4a3a1a,stroke:#ffaa4a,color:#fff
    style L4 fill:#3a1a4a,stroke:#aa4aff,color:#fff

Layer 1: Distributed tracing with OpenTelemetry

OpenTelemetry’s GenAI semantic conventions provide the vocabulary for agent observability that did not exist 18 months ago. Every tool call, LLM invocation, and agent delegation becomes a span in a distributed trace. The critical capability is cross-agent context propagation: when Agent A delegates to Agent B, the trace context follows, creating a complete graph of the execution.

This is not theoretical. Microsoft’s Agent Framework, IBM’s Bee Stack, CrewAI, AutoGen, and LangGraph all support OpenTelemetry instrumentation now. The semantic conventions standardize the span attributes so that any backend – Jaeger, Grafana Tempo, Datadog, New Relic – can render the same traces.

The gap that remains: auto-instrumentation covers LLM calls and tool invocations, but inter-agent delegation tracing still requires explicit instrumentation in most frameworks. If you are building a multi-agent system today, instrument the delegation points manually. Do not assume the framework handles it.

Layer 2: Runtime policy enforcement

Tracing tells you what happened. Runtime enforcement prevents what should not happen. This is where NVIDIA OpenShell enters the stack.

OpenShell’s architecture moves policy enforcement outside the agent process entirely. The agent cannot override its own constraints, even if compromised. The policy engine evaluates actions at the binary, destination, method, and path level across filesystem, network, and process layers. Every decision – allow or deny – is logged with full context.

Three properties matter for M2M visibility:

  1. Deny-by-default posture. Agents must be explicitly granted access to every resource. No implicit inheritance, no ambient authority.
  2. Live policy updates. When an agent needs new capabilities, it proposes a policy change. A human approves or rejects. The policy updates without restarting the agent.
  3. Credential isolation. The privacy router ensures that sensitive credentials and data are routed to local models when policy requires, and never forwarded to agents that lack authorization.

OpenShell launched in early preview at GTC 2026 with 17 enterprise partners including Cisco, CrowdStrike, Google Cloud, Microsoft Security, and Trend Micro. It is open source under Apache 2.0, which matters because vendor lock-in on your security control plane is an anti-pattern.

For organizations that cannot adopt OpenShell immediately, the same principles apply through gVisor (Google’s application kernel for container sandboxing) or Firecracker (AWS’s microVM for serverless). The point is runtime isolation with policy enforcement and audit logging, not any specific tool.

Layer 3: Per-agent identity

Every agent needs its own identity. Not a shared API key. Not the orchestrator’s token passed down the chain. Its own, short-lived, scoped credential.

SPIFFE (Secure Production Identity Framework For Everyone) and its reference implementation SPIRE provide exactly this: workload identity for services that do not have human users. Each agent receives a SPIFFE Verifiable Identity Document (SVID) that is cryptographically signed, short-lived (minutes, not hours), and automatically rotated.

When combined with mTLS, every agent-to-agent call is mutually authenticated. Agent B can verify that the call actually came from Agent A, and Agent A can verify it is talking to the real Agent B. This directly addresses OWASP ASI07’s attack vectors: message spoofing fails because the attacker cannot forge the SVID, and agent-in-the-middle attacks fail because mTLS prevents interception.

The Gravitee report found that 45.6% of teams still rely on shared API keys for agent-to-agent authentication. Shared keys mean that if any agent is compromised, every agent that shares the key is compromised. Per-agent identity with SPIFFE eliminates this blast radius entirely.

Layer 4: Behavioral analysis

The first three layers give you visibility and enforcement. The fourth layer gives you detection.

Every agent role should have a behavioral baseline: the normal set of APIs it calls, the typical call frequency, the expected data volumes, the usual delegation patterns. A research agent that normally makes 50-200 search API calls per task and suddenly makes 3,000 calls to a data warehouse it has never accessed before is exhibiting anomalous behavior, regardless of whether each individual call is authorized.

This is where the 48.3% bot-differentiation gap from the SALT report gets addressed. You cannot differentiate agents from bots by looking at individual requests. You differentiate them by looking at behavioral patterns over time. Legitimate agents have predictable, role-consistent patterns. Malicious bots masquerading as agents do not.

Build baselines per agent role, not per agent instance. Alert on three things: calls to services outside the agent’s normal pattern, call volumes that deviate more than 2 standard deviations from the baseline, and delegation chains longer than the role’s expected depth.

What 92% security immaturity actually means

The SALT report’s most damning finding is not the 48.9% visibility number. It is that 92% of organizations lack the security maturity to defend agentic environments. Only 23.5% find their existing security tools effective. And 78.6% of security leaders report increased executive scrutiny of AI risks – meaning leadership is asking questions that security teams cannot answer.

This maps directly to the Saviynt findings: 86% of organizations do not enforce access policies for AI identities, and 95% doubt they could detect or contain misuse if it occurred. The scrutiny is coming from the board. The capability to respond is not.

The organizations that will close this gap first are the ones that treat agent observability as infrastructure, not as a feature. You would never deploy a microservice without logging, tracing, and monitoring. The fact that we deploy agents without these is a failure of habits, not of technology. The tools exist. The standards exist. OpenTelemetry has the semantic conventions. OpenShell has the runtime enforcement. SPIFFE has the identity layer. The four-layer stack described above can be implemented incrementally, starting with tracing (highest visibility gain for lowest effort) and adding enforcement, identity, and behavioral analysis as the deployment matures.

The alternative is to remain in the 48.9% – blind to what your agents are doing, unable to distinguish them from attackers, and hoping that the 88% incident rate that Gravitee measured does not include you.

It already does. You just cannot see it.


Related reading:

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch