Detection Methodology

How Doorman's 4-layer detection engine is designed for maximum accuracy with minimal false positives. Verified against 2,464 test cases.

Doorman uses a layered detection approach. Each layer adds depth and precision, and findings are cross-validated between layers to eliminate false positives.

Detection Pipeline #

Every scan runs through all 4 layers in sequence:

1. Regex Engine

2,508 pattern rules

→

2. Taint Tracking

Data flow analysis

→

3. Scope Analysis

Variable resolution

→

4. AST Engine

Structural analysis

Layer 1: Regex Engine #

The regex engine is the fast first pass. It scans every file against 2,508 pattern-based rules covering all 10 categories and 11 languages.

How it works

Each rule is a regex pattern with metadata: ID, severity, category, description, and optional auto-fix.
Rules are organized by language and category in src/rules/.
The engine runs all applicable rules against each file based on file extension.
Matches are reported as candidate findings for further validation.

Strengths

Extremely fast — scans thousands of files per second
Broad coverage — catches common patterns like hardcoded secrets, dangerous function calls, missing security headers
Language-agnostic patterns work across similar languages

Limitations

Cannot understand data flow or variable scope
May produce false positives on its own (resolved by later layers)

Layer 2: Taint Tracking #

The taint tracker traces untrusted user input through the code to dangerous sinks. This is critical for detecting injection vulnerabilities.

How it works

Sources: Identifies where user input enters the application (request parameters, form data, URL params, file uploads, database reads).
Propagation: Tracks how tainted data flows through assignments, function calls, string concatenation, and transformations.
Sanitizers: Recognizes when data is properly sanitized (escaping, parameterized queries, validation).
Sinks: Detects when tainted data reaches a dangerous operation (SQL query, HTML output, command execution, file system access).

Example

// Source: user input
const name = req.query.name;

// Propagation: data flows through concatenation
const query = "SELECT * FROM users WHERE name = '" + name + "'";

// Sink: tainted data reaches SQL execution
db.execute(query);  // CRITICAL: SQL injection detected

Safe version:

// Fix: use parameterized queries instead of string concatenation
const name = req.query.name;
db.execute("SELECT * FROM users WHERE name = ?", [name]);

Note: If the code used parameterized queries (db.execute("SELECT * FROM users WHERE name = ?", [name])), the taint tracker would recognize the sanitization and not flag the issue.

Layer 3: Scope Analysis #

The scope analyzer understands variable scope, hoisting, closures, and shadowing. This eliminates false positives from the regex engine.

How it works

Builds a scope tree for each file, tracking where variables are declared and accessible.
Resolves variable references to their declarations.
Understands block scope (let/const), function scope (var), and module scope.
Detects shadowed variables and ensures findings reference the correct declaration.

Why it matters

const password = process.env.DB_PASSWORD;  // Safe: from environment

function test() {
  const password = "test123";  // Only in test scope — not a real leak
  return login(password);
}

Without scope analysis, a regex engine would flag password = "test123" as a hardcoded credential. The scope analyzer knows it is a local test variable, reducing false positives.

Safe version:

// Fix: always load credentials from environment variables
const password = process.env.DB_PASSWORD;

function test() {
  const password = process.env.TEST_PASSWORD || "placeholder";
  return login(password);
}

Layer 4: AST Engine #

The AST (Abstract Syntax Tree) engine uses tree-sitter to parse code into a structural representation, enabling deep analysis that regex cannot achieve.

How it works

Parses source code into an AST using tree-sitter grammars for each supported language.
Runs structural queries against the AST to find complex patterns.
Analyzes function signatures, control flow, class hierarchies, and import graphs.
Cross-references findings from earlier layers with structural context.

What it catches

Missing authentication — API routes without auth middleware
Unsafe deserialization — deserialize calls on untrusted data
Prototype pollution — recursive merge without property checks
Race conditions — shared state without synchronization
Resource leaks — opened handles without close/dispose

Cross-Validation #

The key to Doorman's low false positive rate is cross-validation between layers:

The regex engine generates candidate findings.
The taint tracker confirms whether data actually flows to a dangerous sink.
The scope analyzer verifies variable context and eliminates scope-based false positives.
The AST engine validates structural patterns and adds findings that only structural analysis can detect.

A finding is only reported if it survives all applicable layers. This multi-pass approach is why Doorman achieves high detection rates without noise.

Performance #

Metric	Value
Scan speed	~1,000 files/second (regex), ~200 files/second (full pipeline)
Memory usage	Under 256 MB for most projects
Caching	Unchanged files are skipped on re-scan
Parallelism	Multi-file scanning with worker threads

Run npx getdoorman check --profile to see per-engine timing on your codebase.