Detection Methodology
How Doorman's 4-layer detection engine is designed for maximum accuracy with minimal false positives. Verified against 2,464 test cases.Doorman uses a layered detection approach. Each layer adds depth and precision, and findings are cross-validated between layers to eliminate false positives.
Detection Pipeline #
Every scan runs through all 4 layers in sequence:
1. Regex Engine
2,508 pattern rules
2. Taint Tracking
Data flow analysis
3. Scope Analysis
Variable resolution
4. AST Engine
Structural analysis
Layer 1: Regex Engine #
The regex engine is the fast first pass. It scans every file against 2,508 pattern-based rules covering all 10 categories and 11 languages.
How it works
- Each rule is a regex pattern with metadata: ID, severity, category, description, and optional auto-fix.
- Rules are organized by language and category in
src/rules/. - The engine runs all applicable rules against each file based on file extension.
- Matches are reported as candidate findings for further validation.
Strengths
- Extremely fast — scans thousands of files per second
- Broad coverage — catches common patterns like hardcoded secrets, dangerous function calls, missing security headers
- Language-agnostic patterns work across similar languages
Limitations
- Cannot understand data flow or variable scope
- May produce false positives on its own (resolved by later layers)
Layer 2: Taint Tracking #
The taint tracker traces untrusted user input through the code to dangerous sinks. This is critical for detecting injection vulnerabilities.
How it works
- Sources: Identifies where user input enters the application (request parameters, form data, URL params, file uploads, database reads).
- Propagation: Tracks how tainted data flows through assignments, function calls, string concatenation, and transformations.
- Sanitizers: Recognizes when data is properly sanitized (escaping, parameterized queries, validation).
- Sinks: Detects when tainted data reaches a dangerous operation (SQL query, HTML output, command execution, file system access).
Example
// Source: user input
const name = req.query.name;
// Propagation: data flows through concatenation
const query = "SELECT * FROM users WHERE name = '" + name + "'";
// Sink: tainted data reaches SQL execution
db.execute(query); // CRITICAL: SQL injection detected
Safe version:
// Fix: use parameterized queries instead of string concatenation
const name = req.query.name;
db.execute("SELECT * FROM users WHERE name = ?", [name]);
Note: If the code used parameterized queries (db.execute("SELECT * FROM users WHERE name = ?", [name])), the taint tracker would recognize the sanitization and not flag the issue.
Layer 3: Scope Analysis #
The scope analyzer understands variable scope, hoisting, closures, and shadowing. This eliminates false positives from the regex engine.
How it works
- Builds a scope tree for each file, tracking where variables are declared and accessible.
- Resolves variable references to their declarations.
- Understands block scope (
let/const), function scope (var), and module scope. - Detects shadowed variables and ensures findings reference the correct declaration.
Why it matters
const password = process.env.DB_PASSWORD; // Safe: from environment
function test() {
const password = "test123"; // Only in test scope — not a real leak
return login(password);
}
Without scope analysis, a regex engine would flag password = "test123" as a hardcoded credential. The scope analyzer knows it is a local test variable, reducing false positives.
Safe version:
// Fix: always load credentials from environment variables
const password = process.env.DB_PASSWORD;
function test() {
const password = process.env.TEST_PASSWORD || "placeholder";
return login(password);
}
Layer 4: AST Engine #
The AST (Abstract Syntax Tree) engine uses tree-sitter to parse code into a structural representation, enabling deep analysis that regex cannot achieve.
How it works
- Parses source code into an AST using tree-sitter grammars for each supported language.
- Runs structural queries against the AST to find complex patterns.
- Analyzes function signatures, control flow, class hierarchies, and import graphs.
- Cross-references findings from earlier layers with structural context.
What it catches
- Missing authentication — API routes without auth middleware
- Unsafe deserialization — deserialize calls on untrusted data
- Prototype pollution — recursive merge without property checks
- Race conditions — shared state without synchronization
- Resource leaks — opened handles without close/dispose
Cross-Validation #
The key to Doorman's low false positive rate is cross-validation between layers:
- The regex engine generates candidate findings.
- The taint tracker confirms whether data actually flows to a dangerous sink.
- The scope analyzer verifies variable context and eliminates scope-based false positives.
- The AST engine validates structural patterns and adds findings that only structural analysis can detect.
A finding is only reported if it survives all applicable layers. This multi-pass approach is why Doorman achieves high detection rates without noise.
Performance #
| Metric | Value |
|---|---|
| Scan speed | ~1,000 files/second (regex), ~200 files/second (full pipeline) |
| Memory usage | Under 256 MB for most projects |
| Caching | Unchanged files are skipped on re-scan |
| Parallelism | Multi-file scanning with worker threads |
Run npx getdoorman check --profile to see per-engine timing on your codebase.