March 14, 2026 · 12 min read

Why Your SIEM Can't Detect AI Threats: Building an AI-Native Security Operations Capability

Your SIEM has thousands of detection rules. It correlates logs from firewalls, endpoints, identity providers, and applications. It catches lateral movement, credential stuffing, data exfiltration, and command-and-control traffic. And when someone submits a prompt injection attack against your AI customer service agent that causes it to exfiltrate customer records into its natural language responses - your SIEM sees nothing.

AI security operations requires a fundamentally different approach to detection than traditional security operations. Not because the stakes are different - they aren’t - but because the attack vectors, the telemetry, and the nature of “anomalous behavior” are completely different for AI systems.

This guide explains what the detection gap is, why it exists, and how to close it.

The Detection Gap: What SIEMs and EDR Miss

Why Traditional Tools Fail for AI Threats

Traditional security tools are built on a fundamental assumption: software behaves deterministically. Given the same inputs under the same conditions, a deterministic system produces the same output. Security anomaly detection works by establishing a baseline of normal behavior (normal network traffic, normal process execution, normal login patterns) and alerting when observed behavior deviates significantly from baseline.

AI systems are probabilistic. The same input can produce different outputs across runs. “Normal” behavior for an LLM is a distribution, not a fixed point. More importantly, the meaningful behavioral signals in an AI system - the semantic content of inputs and outputs, the intent behind a tool call, the appropriateness of a retrieved document - are not represented in any telemetry that traditional security tools collect.

What SIEMs See

When your SIEM receives logs from your AI application:

HTTP requests and responses (status codes, latencies, byte counts)
Authentication events
API call counts and error rates
Network connections from your inference infrastructure

None of these signals can detect:

A prompt injection attack that uses normal HTTP traffic
A jailbreak that produces harmful content (the response is still HTTP 200)
A model generating false outputs (latency and byte counts are normal)
Cross-user data leakage through the RAG system (the query and response both look like normal API calls)
An agent making anomalous tool calls due to indirect injection (the tool calls go through normal API endpoints)

What EDR Sees

EDR tools monitor process execution, file system activity, network connections, and memory on individual endpoints. On your inference server:

Python process running (normal)
GPU memory in use (normal)
Network connections to model API endpoint (normal)
No unusual file system activity

A model that has been manipulated to exfiltrate data via its natural language outputs doesn’t write files, spawn unusual processes, or make anomalous network connections. The “exfiltration” is semantically encoded in normal HTTP responses. EDR cannot detect it.

AI-Specific Threat Categories

Before designing detection capabilities, you need to know what you’re detecting. AI systems face threat categories that have no equivalent in traditional security:

Semantic Threats

Attacks where the harm is carried in the meaning of content, not in its technical properties:

Prompt injection - attacker instructions that override legitimate system prompts
Jailbreaking - inputs that bypass safety controls to elicit prohibited content
Adversarial inputs - inputs crafted to cause specific misbehavior
Hallucination exploitation - causing the model to produce false outputs that are relied upon

Detection approach: Semantic analysis of inputs and outputs against policy classifiers, not byte-level pattern matching.

Behavioral Threats

Attacks where the model’s behavior is the indicator, not the specific input:

Data exfiltration via output - model responses encoding sensitive information
Unauthorized action execution - an agent taking actions it shouldn’t
Resource exhaustion - inputs causing excessive computation or API costs
Scope violation - model providing information or taking actions outside its intended scope

Detection approach: Behavioral baselines for model outputs and agent actions, with anomaly detection against those baselines.

Supply Chain Threats

Compromise of AI system components affecting integrity:

Model weight tampering - backdoored or modified model weights
Dependency compromise - malicious ML packages
Plugin compromise - malicious MCP servers or tool integrations

Detection approach: Integrity verification at load time, behavioral testing after updates, vendor monitoring.

Infrastructure Threats

Traditional threats against AI-specific infrastructure:

GPU cluster attacks - targeting training or inference infrastructure
Training pipeline compromise - injecting into the model development process
API key theft - targeting the credentials that access foundation model APIs

Detection approach: Traditional security controls applied to AI-specific infrastructure, with particular attention to credential and API key management.

Architecture for an AI-Native SOC

Building AI security operations capability requires new telemetry sources, new detection logic, and new response playbooks. Here is the reference architecture:

Component 1: AI Observability Layer

The foundation is comprehensive telemetry collection from AI systems. Traditional application logging captures what you need for debugging. AI security operations requires additional telemetry:

Input telemetry:

Full text of every input to the AI system (appropriately PII-handled)
Input source (user session ID, upstream agent ID, tool name)
Input token count and entropy metrics
Input anomaly pre-screening scores (classifier outputs)

Output telemetry:

Full text of every output from the AI system
Output classification against policy categories (harmful content, PII, sensitive topics)
Output confidence and consistency metrics where available
Output semantic similarity to known harmful patterns

Agent action telemetry:

Every tool call: tool name, full parameters, calling context
Tool call authorization status (approved, flagged, blocked)
Tool call outcome (success, error, timeout)
Tool call sequences (for detecting anomalous chains)

Session telemetry:

Session start/end, duration, turn count
User/agent identity and session context
Cost and resource consumption per session

Component 2: AI Behavioral Analytics Engine

The analytics engine processes telemetry to detect behavioral anomalies. This is where AI-native detection differs most from traditional SIEM:

Semantic classifiers:

Input policy classifier: detects prompt injection attempts, jailbreak patterns, PII in inputs, off-topic requests
Output policy classifier: detects harmful content, PII in outputs, data exfiltration patterns, scope violations
Classifiers must be regularly updated as new attack patterns emerge

Statistical baselines:

Per-user and per-model baseline profiles for: typical input topics, typical output topics, typical tool call patterns, typical session duration and cost
Anomaly scoring against these baselines - a user whose session cost is 50x their normal average warrants investigation

Tool call sequence analysis:

Graph-based analysis of tool call sequences, looking for unusual chains
Specific detection rules for known dangerous sequences (read sensitive file → call outbound webhook)
Volume anomalies in tool call rates

Cross-session correlation:

Detection of coordinated attack campaigns across multiple sessions
Tracking of similar adversarial payloads across user accounts
Detection of systematic probing behavior

Component 3: AI-Specific Detection Rules

Beyond statistical baselines, implement specific detection rules for known AI attack patterns. Examples:

RULE: Prompt injection attempt via authority impersonation
IF input_text CONTAINS ["system update", "new instructions", "ignore previous", "override"] 
AND input_source == "user"
AND (NOT user.is_admin)
THEN alert(severity=HIGH, category="prompt_injection")

RULE: Anomalous tool call parameter
IF tool_call.tool == "send_email"
AND tool_call.params.recipient NOT IN user.authorized_email_domains
THEN block_and_alert(severity=CRITICAL, category="unauthorized_tool_use")

RULE: Session cost anomaly
IF session.current_cost > user.average_session_cost * 10
THEN rate_limit_and_alert(severity=MEDIUM, category="resource_abuse")

RULE: Cross-turn context manipulation
IF conversation.turns > 10
AND conversation.topics CONTAINS ["jailbreak_keywords"]
AND conversation.current_request.policy_score > threshold
THEN flag_for_human_review(severity=HIGH, category="multi_turn_attack")

Component 4: Response Playbooks

AI security incidents require different response actions than traditional incidents:

For suspected prompt injection:

Preserve full conversation transcript (evidentiary)
Assess whether any tool calls resulted from the injection (blast radius)
If tool calls occurred: assess reversibility and trigger reversal where possible
Block continued access for the session/user
Review similar recent sessions for the same attack pattern
Update detection classifier with the new attack pattern

For model output policy violation:

Flag the specific output for human review
Assess whether output was delivered (or can be intercepted)
If delivered: assess downstream harm (was it relied upon? by whom?)
Capture attack vector for classifier retraining consideration
If pattern is new: fast-track to red team for characterization

For anomalous agent behavior:

Immediately rate-limit the agent’s tool call capabilities
Review all tool calls from the current session
For each tool call: verify it was legitimate and authorized
Roll back or remediate any unauthorized actions where possible
Conduct full behavioral review before re-enabling agent

Build vs Buy vs Hybrid

Building AI security operations capability involves choices across the stack:

Build

When to build: Custom telemetry collection and AI-specific detection logic that is specific to your AI architecture. No vendor has a product that understands the semantics of your specific agent’s tool calls.

What to build: Input/output telemetry pipeline, semantic classifiers tuned to your use case, tool call authorization logic, session behavioral baselines.

Realistic effort: 3-6 months to initial capability for a team with ML and security operations expertise.

Buy

When to buy: Infrastructure and correlation capabilities that are generic. You don’t need to build a SIEM from scratch to add AI-specific detection on top.

What to buy: The underlying SIEM/SOAR infrastructure, observability platforms (Datadog, Grafana, etc.) for metrics, log aggregation infrastructure.

What you cannot buy yet: A commercial product that provides complete AI-native SOC capabilities doesn’t exist at the level of maturity that covers the full threat landscape described above. Products are emerging but require significant customization.

Hybrid

The practical approach for most organizations: Deploy standard observability infrastructure for log collection and dashboarding. Build custom AI-specific detection logic (classifiers, behavioral analytics) as code deployed alongside your AI systems. Use the SIEM as the correlation and alerting layer, feeding it AI-specific signals from your custom detection components.

Measuring Detection Effectiveness

Before investing in AI-native SOC capabilities, establish how you will measure whether they’re working. The key metrics:

True Positive Rate by Attack Category

For each AI threat category, track: of all attacks of this type that occur, what percentage generate an alert? This requires adversarial testing - you need to run attacks against your own systems to know if your detection catches them.

Red team testing cadence: Run structured adversarial tests against your AI systems quarterly. Include at minimum: prompt injection via direct user input, indirect injection via controlled content sources, jailbreak attempts across known technique categories, and anomalous agent tool call sequences.

For each test, note whether the detection fired, at what latency, and what alert was generated. This gives you empirical coverage metrics rather than theoretical coverage claims.

Mean Time to Detect (MTTD) for AI Incidents

For AI threats, MTTD should be measured separately for each threat category because detection mechanisms have very different latencies:

Real-time classifiers (prompt injection, output policy) should have sub-second detection
Behavioral anomaly detection (session-level) may have 10-30 minute latency as data accumulates
Drift detection (model-level) may have hours of latency depending on monitoring cadence
Supply chain and infrastructure threats depend on audit log frequency and SIEM ingestion latency

False Positive Rate and Alert Fatigue

AI-native detection is particularly prone to alert fatigue if classifiers are tuned aggressively. Track:

Alerts per week by category
Analyst time per alert (triage + investigation)
False positive rate per category (alerts that were reviewed and found benign)

If any category has a false positive rate above 20%, tune the classifier before expanding coverage. Alert fatigue from poorly tuned classifiers is worse than no detection - analysts learn to ignore the noise.

SIEM Integration Patterns for AI Telemetry

Even though your SIEM can’t natively detect AI threats, it remains the correlation and alerting layer. The integration challenge is getting AI-specific signals into the SIEM in a format that analysts can work with.

Structured Event Schema for AI Alerts

Define a consistent event schema for AI security events before building integrations. A useful schema:

{
  "event_type": "ai_security",
  "timestamp": "ISO8601",
  "severity": "critical|high|medium|low|info",
  "category": "prompt_injection|output_violation|tool_anomaly|behavioral_drift|supply_chain",
  "ai_system_id": "system identifier",
  "session_id": "session identifier (for correlation)",
  "user_id": "user identifier (hashed if PII-sensitive)",
  "finding": {
    "description": "human-readable description",
    "confidence": 0.0-1.0,
    "evidence": "relevant excerpt or indicator",
    "attack_vector": "direct|indirect|multimodal|infrastructure"
  },
  "context": {
    "conversation_turn": integer,
    "tool_calls_in_session": integer,
    "session_cost_usd": float
  }
}

This schema maps to SIEM event fields in a predictable way, enabling correlation rules that join AI security events with other event types (authentication, network, endpoint).

Correlation Rules Across AI and Traditional Telemetry

Once AI events are in the SIEM, write correlation rules that join them with traditional security telemetry:

Rule: AI attack followed by anomalous infrastructure access If an AI system generates a prompt_injection alert, and within 10 minutes the same user’s credentials are used to access infrastructure they don’t normally access - this correlation may indicate the injection was part of a broader attack.

Rule: Repeated AI policy violations from same source Five or more output_violation or prompt_injection events from the same user in 24 hours suggests systematic probing rather than accidental policy violation. Correlate with authentication logs to check for account compromise vs. malicious user.

Rule: AI behavioral drift coinciding with deployment event A behavioral_drift event that correlates with a deployment event in your CI/CD logs may indicate an unauthorized deployment. Correlate with your deployment pipeline telemetry.

Getting Started: Minimum Viable AI Security Operations

If you’re starting from zero, implement these four capabilities first:

1. Full input/output logging - If you’re not logging every input and output with session context, you cannot investigate AI security incidents. This is table stakes. Log everything; you will thank yourself when the first incident occurs.

2. Output policy classifier - Detect harmful, out-of-policy outputs before they become incidents. Even a simple classifier running in your application middleware catches obvious violations. Start with an off-the-shelf classifier and tune it over 4-6 weeks before building custom models.

3. Tool call audit log - For any agentic system: every tool call with full parameters must be logged with tamper-evident storage. Implement this before deploying agents to production. This is non-negotiable for incident investigation capability.

4. Cost and resource anomaly alerting - The cheapest detection capability with significant coverage for DoS and resource abuse. Set a per-user session cost threshold that is 10x the expected cost for your use case. This catches both attacks and runaway behavior.

5. Behavioral baseline (after 4-6 weeks) - Once you have logging in place and a baseline of normal usage, add statistical anomaly detection for the metrics that matter most for your application: input topic distribution, output policy violation rate, tool call patterns.

The minimum viable AI security operations stack takes 2-4 weeks to deploy for a single AI application, assuming you have an existing logging infrastructure to build on. The investment is justified for any AI application with access to sensitive data or the ability to take real-world actions.

Our AI-Powered SOC service provides the full AI-native security operations stack: telemetry collection, behavioral analytics, detection rules, and 24/7 monitoring by analysts who understand AI-specific attack patterns. Contact us to discuss coverage for your AI systems.

For the offensive side - validating that your defenses can detect the attacks they’re designed to catch - see infosec.qa for AI red teaming and adversarial testing services.

Defend AI with AI

Start with a free AI SOC Readiness Assessment and see where your AI defenses stand.

Assess Your AI SOC Readiness