Detection Methodology

How Doorman's 4-layer detection engine is designed for maximum accuracy with minimal false positives. Verified against 2,464 test cases.

Doorman uses a layered detection approach. Each layer adds depth and precision, and findings are cross-validated between layers to eliminate false positives.

Detection Pipeline #

Every scan runs through all 4 layers in sequence:

1. Regex Engine

2,508 pattern rules

2. Taint Tracking

Data flow analysis

3. Scope Analysis

Variable resolution

4. AST Engine

Structural analysis

Layer 1: Regex Engine #

The regex engine is the fast first pass. It scans every file against 2,508 pattern-based rules covering all 10 categories and 11 languages.

How it works

Strengths

Limitations

Layer 2: Taint Tracking #

The taint tracker traces untrusted user input through the code to dangerous sinks. This is critical for detecting injection vulnerabilities.

How it works

Example

// Source: user input
const name = req.query.name;

// Propagation: data flows through concatenation
const query = "SELECT * FROM users WHERE name = '" + name + "'";

// Sink: tainted data reaches SQL execution
db.execute(query);  // CRITICAL: SQL injection detected

Safe version:

// Fix: use parameterized queries instead of string concatenation
const name = req.query.name;
db.execute("SELECT * FROM users WHERE name = ?", [name]);

Note: If the code used parameterized queries (db.execute("SELECT * FROM users WHERE name = ?", [name])), the taint tracker would recognize the sanitization and not flag the issue.

Layer 3: Scope Analysis #

The scope analyzer understands variable scope, hoisting, closures, and shadowing. This eliminates false positives from the regex engine.

How it works

Why it matters

const password = process.env.DB_PASSWORD;  // Safe: from environment

function test() {
  const password = "test123";  // Only in test scope — not a real leak
  return login(password);
}

Without scope analysis, a regex engine would flag password = "test123" as a hardcoded credential. The scope analyzer knows it is a local test variable, reducing false positives.

Safe version:

// Fix: always load credentials from environment variables
const password = process.env.DB_PASSWORD;

function test() {
  const password = process.env.TEST_PASSWORD || "placeholder";
  return login(password);
}

Layer 4: AST Engine #

The AST (Abstract Syntax Tree) engine uses tree-sitter to parse code into a structural representation, enabling deep analysis that regex cannot achieve.

How it works

What it catches

Cross-Validation #

The key to Doorman's low false positive rate is cross-validation between layers:

  1. The regex engine generates candidate findings.
  2. The taint tracker confirms whether data actually flows to a dangerous sink.
  3. The scope analyzer verifies variable context and eliminates scope-based false positives.
  4. The AST engine validates structural patterns and adds findings that only structural analysis can detect.

A finding is only reported if it survives all applicable layers. This multi-pass approach is why Doorman achieves high detection rates without noise.

Performance #

MetricValue
Scan speed~1,000 files/second (regex), ~200 files/second (full pipeline)
Memory usageUnder 256 MB for most projects
CachingUnchanged files are skipped on re-scan
ParallelismMulti-file scanning with worker threads

Run npx getdoorman check --profile to see per-engine timing on your codebase.