Detection engineering is the discipline of designing, building, testing, and maintaining detection logic — the rules, queries, and behavioral analytics that turn raw log telemetry into actionable security alerts. It sits at the boundary between security operations and software engineering. Done well, it's the foundation everything else in a SOC rests on. Done poorly — or not done at all — it explains why organizations with expensive SIEM deployments still get surprised by incidents that left observable traces for weeks.
The role of detection engineer is distinct from SOC analyst, and the distinction matters. Analysts triage and investigate. Detection engineers design the system that produces what analysts triage. The separation is similar to the difference between a software developer and a QA engineer, or between a network architect and a network operations center technician. One designs the system; the other operates it. When both roles are collapsed into the same person, both functions suffer.
What Detection Engineers Actually Do
A detection engineer's primary output is detection logic that is accurate, maintainable, and version-controlled. The work involves:
Translating threat intelligence into detection hypotheses. When a new adversary technique is published — either in a MITRE ATT&CK update, a vendor threat report, or a community disclosure — the detection engineer asks: what observable evidence would this technique leave in our telemetry? Not in a generic environment, but in our specific log sources, our specific EDR deployment, our specific network architecture. That environment-specific translation is the core craft.
Writing and testing detection rules. In a SIEM-native workflow, this means writing Splunk SPL searches or Elasticsearch KQL queries. In a vendor-agnostic workflow, it means writing Sigma rules — the open-source, platform-neutral detection format — and then compiling them to platform-specific queries using tools like sigma-cli or pySigma. A basic Sigma rule structure looks like this:
title: PowerShell Encoded Command Execution
logsource:
category: process_creation
product: windows
detection:
selection:
CommandLine|contains:
- ' -EncodedCommand '
- ' -enc '
- ' -EC '
condition: selection
level: medium
tags:
- attack.execution
- attack.t1059.001
This rule targets T1059.001 by looking for PowerShell invocations with encoded command flags. Writing the rule takes minutes. Testing it — validating that it fires on actual malicious payloads, doesn't fire on legitimate software that uses encoded commands for benign reasons, and handles encoding variations used by common malware loaders — takes considerably longer. The testing work is where detection engineering diverges most sharply from "just writing a query."
Managing rule lifecycle. Rules don't stay accurate indefinitely. Environments change, adversary techniques evolve, software updates introduce new behavioral patterns. Rule decay — the gradual degradation of a rule's accuracy over time without explicit maintenance — is one of the most common sources of false coverage claims. Detection engineers own rule lifecycle: tracking which rules are active, monitoring false-positive rates, scheduling periodic re-validation against current telemetry, and deprecating rules whose data sources have changed.
Detection-as-Code: The Workflow That Scales
Detection-as-code is the practice of managing detection logic in version control (Git), with automated testing pipelines and peer review processes borrowed from software engineering. The core insight is that detection rules have the same failure modes as application code: they can have bugs, they can regress after changes, they can accumulate technical debt, and they need to be tested before deployment. Treating them like code — with review, history, and rollback capability — addresses all of these.
A mature detection-as-code workflow has several components. Rules are stored as Sigma YAML (or vendor-native in cases where Sigma conversion loses fidelity) in a Git repository with branch protection. Changes go through pull request review — a second detection engineer checks logic, evaluates false-positive risk, and confirms the rule targets the intended technique variant. CI pipelines run automated validation: syntax checks, sigma-cli compilation to target platform(s), and if available, replay tests against a corpus of known-malicious log samples.
The benefit isn't just engineering hygiene. It's operational traceability. When a rule produces a spike in false positives at 3am, the on-call analyst can see exactly when the rule was last modified, what changed, and who reviewed it. When a rule is disabled to stop a false-positive flood, that disable action is a commit with a reason — not an invisible configuration change in the SIEM GUI that gets forgotten.
Atomic Rules vs. Correlation Rules
Detection logic divides into two categories with different design patterns and different operational characteristics.
Atomic rules fire on a single event that is itself suspicious — a process creation matching a known malware signature, an authentication with specific credential stuffing patterns, a PowerShell invocation with an encoded command. These are high-fidelity but can be easily evaded by adversaries who know the rule. They're the foundation of coverage but not the ceiling.
Correlation rules fire on patterns across multiple events and multiple time windows. A single failed authentication is noise. Twenty-seven failed authentications from the same source IP against fifteen different user accounts in four minutes is a password spray — T1110.003. Correlation rules are harder to evade because they target behavioral patterns rather than individual indicators. They're also harder to write and tune: the temporal window, the event count threshold, and the grouping logic all affect false-positive rate and detection confidence.
We're not saying atomic rules are a lesser form of detection — they serve different purposes. For technique coverage audits and ATT&CK mapping, atomic rules are the right unit. For catching sophisticated adversaries who are specifically evading indicator-based detection, correlation rules and behavioral analytics are necessary. A mature detection program has both, with clear documentation of which techniques are covered at what confidence level by which rule type.
The Baseline Problem
Many of the most powerful detection patterns — anomalous authentication times, unusual process relationships, unexpected network connections — require knowing what "normal" looks like first. Building behavioral baselines is not a one-time task. Environments evolve: new applications are deployed, remote work patterns shift authentication times, cloud workload scaling changes network traffic patterns. A baseline built six months ago may be meaningfully inaccurate for today's environment, which means anomaly-based detections built against that baseline generate false positives that look like drift rather than detections that identify actual threats.
Baseline drift is the most common reason anomaly-based detection rules get disabled — not because the detection logic was wrong, but because the baseline it was measuring against became stale. Detection engineers who own anomaly-based rules need a process for baseline refresh that's separate from rule logic maintenance. The two are related but different engineering problems.
The Detection Coverage Gap You Should Measure
Every SOC program should have an explicit answer to: what percentage of our covered ATT&CK techniques have been validated against our actual telemetry in the last 90 days? For most organizations, this number is lower than anyone is comfortable admitting. The detection engineering function exists in part to make it higher — through systematic atomic testing, regular purple-team exercises, and coverage gap analysis against the specific threat actors relevant to the organization's industry and geography.
Detection engineering is not a project that gets completed. It's a practice that runs continuously alongside the threat landscape it's tracking. The teams that treat it as a project — write the rules during the deployment phase, then move on — tend to find themselves with a SIEM full of stale detection logic and a board deck showing coverage that doesn't exist in the environment. The teams that treat it as a practice — with dedicated headcount, version-controlled rule sets, and quarterly validation cycles — build detection capability that compounds over time rather than decaying.