Why Your SIEM Can't Detect AI Threats: Building an AI-Native Security Operations Capability
Your SIEM has thousands of detection rules. It correlates logs from firewalls, endpoints, identity providers, and applications. It catches lateral movement, credential stuffing, data exfiltration, and command-and-control traffic. And when someone submits a prompt injection attack against your AI customer service agent that causes it to exfiltrate customer records into its natural language responses - your SIEM sees nothing.
AI security operations requires a fundamentally different approach to detection than traditional security operations. Not because the stakes are different - they aren’t - but because the attack vectors, the telemetry, and the nature of “anomalous behavior” are completely different for AI systems.
This guide explains what the detection gap is, why it exists, and how to close it.
The Detection Gap: What SIEMs and EDR Miss
Why Traditional Tools Fail for AI Threats
Traditional security tools are built on a fundamental assumption: software behaves deterministically. Given the same inputs under the same conditions, a deterministic system produces the same output. Security anomaly detection works by establishing a baseline of normal behavior (normal network traffic, normal process execution, normal login patterns) and alerting when observed behavior deviates significantly from baseline.
AI systems are probabilistic. The same input can produce different outputs across runs. “Normal” behavior for an LLM is a distribution, not a fixed point. More importantly, the meaningful behavioral signals in an AI system - the semantic content of inputs and outputs, the intent behind a tool call, the appropriateness of a retrieved document - are not represented in any telemetry that traditional security tools collect.
What SIEMs See
When your SIEM receives logs from your AI application:
- HTTP requests and responses (status codes, latencies, byte counts)
- Authentication events
- API call counts and error rates
- Network connections from your inference infrastructure
None of these signals can detect:
- A prompt injection attack that uses normal HTTP traffic
- A jailbreak that produces harmful content (the response is still HTTP 200)
- A model generating false outputs (latency and byte counts are normal)
- Cross-user data leakage through the RAG system (the query and response both look like normal API calls)
- An agent making anomalous tool calls due to indirect injection (the tool calls go through normal API endpoints)
What EDR Sees
EDR tools monitor process execution, file system activity, network connections, and memory on individual endpoints. On your inference server:
- Python process running (normal)
- GPU memory in use (normal)
- Network connections to model API endpoint (normal)
- No unusual file system activity
A model that has been manipulated to exfiltrate data via its natural language outputs doesn’t write files, spawn unusual processes, or make anomalous network connections. The “exfiltration” is semantically encoded in normal HTTP responses. EDR cannot detect it.
AI-Specific Threat Categories
Before designing detection capabilities, you need to know what you’re detecting. AI systems face threat categories that have no equivalent in traditional security:
Semantic Threats
Attacks where the harm is carried in the meaning of content, not in its technical properties:
- Prompt injection - attacker instructions that override legitimate system prompts
- Jailbreaking - inputs that bypass safety controls to elicit prohibited content
- Adversarial inputs - inputs crafted to cause specific misbehavior
- Hallucination exploitation - causing the model to produce false outputs that are relied upon
Detection approach: Semantic analysis of inputs and outputs against policy classifiers, not byte-level pattern matching.
Behavioral Threats
Attacks where the model’s behavior is the indicator, not the specific input:
- Data exfiltration via output - model responses encoding sensitive information
- Unauthorized action execution - an agent taking actions it shouldn’t
- Resource exhaustion - inputs causing excessive computation or API costs
- Scope violation - model providing information or taking actions outside its intended scope
Detection approach: Behavioral baselines for model outputs and agent actions, with anomaly detection against those baselines.
Supply Chain Threats
Compromise of AI system components affecting integrity:
- Model weight tampering - backdoored or modified model weights
- Dependency compromise - malicious ML packages
- Plugin compromise - malicious MCP servers or tool integrations
Detection approach: Integrity verification at load time, behavioral testing after updates, vendor monitoring.
Infrastructure Threats
Traditional threats against AI-specific infrastructure:
- GPU cluster attacks - targeting training or inference infrastructure
- Training pipeline compromise - injecting into the model development process
- API key theft - targeting the credentials that access foundation model APIs
Detection approach: Traditional security controls applied to AI-specific infrastructure, with particular attention to credential and API key management.
Architecture for an AI-Native SOC
Building AI security operations capability requires new telemetry sources, new detection logic, and new response playbooks. Here is the reference architecture:
Component 1: AI Observability Layer
The foundation is comprehensive telemetry collection from AI systems. Traditional application logging captures what you need for debugging. AI security operations requires additional telemetry:
Input telemetry:
- Full text of every input to the AI system (appropriately PII-handled)
- Input source (user session ID, upstream agent ID, tool name)
- Input token count and entropy metrics
- Input anomaly pre-screening scores (classifier outputs)
Output telemetry:
- Full text of every output from the AI system
- Output classification against policy categories (harmful content, PII, sensitive topics)
- Output confidence and consistency metrics where available
- Output semantic similarity to known harmful patterns
Agent action telemetry:
- Every tool call: tool name, full parameters, calling context
- Tool call authorization status (approved, flagged, blocked)
- Tool call outcome (success, error, timeout)
- Tool call sequences (for detecting anomalous chains)
Session telemetry:
- Session start/end, duration, turn count
- User/agent identity and session context
- Cost and resource consumption per session
Component 2: AI Behavioral Analytics Engine
The analytics engine processes telemetry to detect behavioral anomalies. This is where AI-native detection differs most from traditional SIEM:
Semantic classifiers:
- Input policy classifier: detects prompt injection attempts, jailbreak patterns, PII in inputs, off-topic requests
- Output policy classifier: detects harmful content, PII in outputs, data exfiltration patterns, scope violations
- Classifiers must be regularly updated as new attack patterns emerge
Statistical baselines:
- Per-user and per-model baseline profiles for: typical input topics, typical output topics, typical tool call patterns, typical session duration and cost
- Anomaly scoring against these baselines - a user whose session cost is 50x their normal average warrants investigation
Tool call sequence analysis:
- Graph-based analysis of tool call sequences, looking for unusual chains
- Specific detection rules for known dangerous sequences (read sensitive file → call outbound webhook)
- Volume anomalies in tool call rates
Cross-session correlation:
- Detection of coordinated attack campaigns across multiple sessions
- Tracking of similar adversarial payloads across user accounts
- Detection of systematic probing behavior
Component 3: AI-Specific Detection Rules
Beyond statistical baselines, implement specific detection rules for known AI attack patterns. Examples:
RULE: Prompt injection attempt via authority impersonation
IF input_text CONTAINS ["system update", "new instructions", "ignore previous", "override"]
AND input_source == "user"
AND (NOT user.is_admin)
THEN alert(severity=HIGH, category="prompt_injection")
RULE: Anomalous tool call parameter
IF tool_call.tool == "send_email"
AND tool_call.params.recipient NOT IN user.authorized_email_domains
THEN block_and_alert(severity=CRITICAL, category="unauthorized_tool_use")
RULE: Session cost anomaly
IF session.current_cost > user.average_session_cost * 10
THEN rate_limit_and_alert(severity=MEDIUM, category="resource_abuse")
RULE: Cross-turn context manipulation
IF conversation.turns > 10
AND conversation.topics CONTAINS ["jailbreak_keywords"]
AND conversation.current_request.policy_score > threshold
THEN flag_for_human_review(severity=HIGH, category="multi_turn_attack")
Component 4: Response Playbooks
AI security incidents require different response actions than traditional incidents:
For suspected prompt injection:
- Preserve full conversation transcript (evidentiary)
- Assess whether any tool calls resulted from the injection (blast radius)
- If tool calls occurred: assess reversibility and trigger reversal where possible
- Block continued access for the session/user
- Review similar recent sessions for the same attack pattern
- Update detection classifier with the new attack pattern
For model output policy violation:
- Flag the specific output for human review
- Assess whether output was delivered (or can be intercepted)
- If delivered: assess downstream harm (was it relied upon? by whom?)
- Capture attack vector for classifier retraining consideration
- If pattern is new: fast-track to red team for characterization
For anomalous agent behavior:
- Immediately rate-limit the agent’s tool call capabilities
- Review all tool calls from the current session
- For each tool call: verify it was legitimate and authorized
- Roll back or remediate any unauthorized actions where possible
- Conduct full behavioral review before re-enabling agent
Build vs Buy vs Hybrid
Building AI security operations capability involves choices across the stack:
Build
When to build: Custom telemetry collection and AI-specific detection logic that is specific to your AI architecture. No vendor has a product that understands the semantics of your specific agent’s tool calls.
What to build: Input/output telemetry pipeline, semantic classifiers tuned to your use case, tool call authorization logic, session behavioral baselines.
Realistic effort: 3-6 months to initial capability for a team with ML and security operations expertise.
Buy
When to buy: Infrastructure and correlation capabilities that are generic. You don’t need to build a SIEM from scratch to add AI-specific detection on top.
What to buy: The underlying SIEM/SOAR infrastructure, observability platforms (Datadog, Grafana, etc.) for metrics, log aggregation infrastructure.
What you cannot buy yet: A commercial product that provides complete AI-native SOC capabilities doesn’t exist at the level of maturity that covers the full threat landscape described above. Products are emerging but require significant customization.
Hybrid
The practical approach for most organizations: Deploy standard observability infrastructure for log collection and dashboarding. Build custom AI-specific detection logic (classifiers, behavioral analytics) as code deployed alongside your AI systems. Use the SIEM as the correlation and alerting layer, feeding it AI-specific signals from your custom detection components.
Measuring Detection Effectiveness
Before investing in AI-native SOC capabilities, establish how you will measure whether they’re working. The key metrics:
True Positive Rate by Attack Category
For each AI threat category, track: of all attacks of this type that occur, what percentage generate an alert? This requires adversarial testing - you need to run attacks against your own systems to know if your detection catches them.
Red team testing cadence: Run structured adversarial tests against your AI systems quarterly. Include at minimum: prompt injection via direct user input, indirect injection via controlled content sources, jailbreak attempts across known technique categories, and anomalous agent tool call sequences.
For each test, note whether the detection fired, at what latency, and what alert was generated. This gives you empirical coverage metrics rather than theoretical coverage claims.
Mean Time to Detect (MTTD) for AI Incidents
For AI threats, MTTD should be measured separately for each threat category because detection mechanisms have very different latencies:
- Real-time classifiers (prompt injection, output policy) should have sub-second detection
- Behavioral anomaly detection (session-level) may have 10-30 minute latency as data accumulates
- Drift detection (model-level) may have hours of latency depending on monitoring cadence
- Supply chain and infrastructure threats depend on audit log frequency and SIEM ingestion latency
False Positive Rate and Alert Fatigue
AI-native detection is particularly prone to alert fatigue if classifiers are tuned aggressively. Track:
- Alerts per week by category
- Analyst time per alert (triage + investigation)
- False positive rate per category (alerts that were reviewed and found benign)
If any category has a false positive rate above 20%, tune the classifier before expanding coverage. Alert fatigue from poorly tuned classifiers is worse than no detection - analysts learn to ignore the noise.
SIEM Integration Patterns for AI Telemetry
Even though your SIEM can’t natively detect AI threats, it remains the correlation and alerting layer. The integration challenge is getting AI-specific signals into the SIEM in a format that analysts can work with.
Structured Event Schema for AI Alerts
Define a consistent event schema for AI security events before building integrations. A useful schema:
{
"event_type": "ai_security",
"timestamp": "ISO8601",
"severity": "critical|high|medium|low|info",
"category": "prompt_injection|output_violation|tool_anomaly|behavioral_drift|supply_chain",
"ai_system_id": "system identifier",
"session_id": "session identifier (for correlation)",
"user_id": "user identifier (hashed if PII-sensitive)",
"finding": {
"description": "human-readable description",
"confidence": 0.0-1.0,
"evidence": "relevant excerpt or indicator",
"attack_vector": "direct|indirect|multimodal|infrastructure"
},
"context": {
"conversation_turn": integer,
"tool_calls_in_session": integer,
"session_cost_usd": float
}
}
This schema maps to SIEM event fields in a predictable way, enabling correlation rules that join AI security events with other event types (authentication, network, endpoint).
Correlation Rules Across AI and Traditional Telemetry
Once AI events are in the SIEM, write correlation rules that join them with traditional security telemetry:
Rule: AI attack followed by anomalous infrastructure access
If an AI system generates a prompt_injection alert, and within 10 minutes the same user’s credentials are used to access infrastructure they don’t normally access - this correlation may indicate the injection was part of a broader attack.
Rule: Repeated AI policy violations from same source
Five or more output_violation or prompt_injection events from the same user in 24 hours suggests systematic probing rather than accidental policy violation. Correlate with authentication logs to check for account compromise vs. malicious user.
Rule: AI behavioral drift coinciding with deployment event
A behavioral_drift event that correlates with a deployment event in your CI/CD logs may indicate an unauthorized deployment. Correlate with your deployment pipeline telemetry.
Getting Started: Minimum Viable AI Security Operations
If you’re starting from zero, implement these four capabilities first:
1. Full input/output logging - If you’re not logging every input and output with session context, you cannot investigate AI security incidents. This is table stakes. Log everything; you will thank yourself when the first incident occurs.
2. Output policy classifier - Detect harmful, out-of-policy outputs before they become incidents. Even a simple classifier running in your application middleware catches obvious violations. Start with an off-the-shelf classifier and tune it over 4-6 weeks before building custom models.
3. Tool call audit log - For any agentic system: every tool call with full parameters must be logged with tamper-evident storage. Implement this before deploying agents to production. This is non-negotiable for incident investigation capability.
4. Cost and resource anomaly alerting - The cheapest detection capability with significant coverage for DoS and resource abuse. Set a per-user session cost threshold that is 10x the expected cost for your use case. This catches both attacks and runaway behavior.
5. Behavioral baseline (after 4-6 weeks) - Once you have logging in place and a baseline of normal usage, add statistical anomaly detection for the metrics that matter most for your application: input topic distribution, output policy violation rate, tool call patterns.
The minimum viable AI security operations stack takes 2-4 weeks to deploy for a single AI application, assuming you have an existing logging infrastructure to build on. The investment is justified for any AI application with access to sensitive data or the ability to take real-world actions.
Our AI-Powered SOC service provides the full AI-native security operations stack: telemetry collection, behavioral analytics, detection rules, and 24/7 monitoring by analysts who understand AI-specific attack patterns. Contact us to discuss coverage for your AI systems.
For the offensive side - validating that your defenses can detect the attacks they’re designed to catch - see infosec.qa for AI red teaming and adversarial testing services.
Defend AI with AI
Start with a free AI SOC Readiness Assessment and see where your AI defenses stand.
Assess Your AI SOC Readiness