May 13, 2026

What GhostMap's scope files actually do

Andrew Gonzaga · Software Engineer

I've been building a security tool called GhostMap — an open-source web reconnaissance tool for authorized bug bounty research. You point it at a target, it crawls with a real browser, and it surfaces the endpoints most worth investigating manually.

There's a design decision I want to walk through, because I've seen people misread it: GhostMap requires a YAML scope file for every scan. Not instead of command-line flags — GhostMap has plenty of those — but alongside them. The scope file and the CLI flags solve different problems, and untangling those two layers is what makes the tool safe to use across multiple bug bounty programs without thinking carefully every time.

The two-layer split

When you run a scan, the command looks like this:

ghostmap https://target.example.com \
  -w example-bbp \
  --idor \
  --auth-header "Authorization: Bearer $TOKEN" \
  --html-report

The CLI flags here are per-run choices: --idor enables IDOR candidate detection on this scan; --html-report generates a dashboard; --auth-header supplies a credential for this invocation. These are operational decisions you make at the moment of scanning.

The scope file at scopes/example-bbp.yaml, referenced via -w example-bbp, is something different. It declares the authorization context for the entire engagement:

program: example-bbp
platform: hackerone

in_scope:
  hosts:
    - example.com
    - "*.example.com"

out_of_scope:
  hosts:
    - admin.example.com
  path_patterns:
    - "^/admin"
    - "^/internal"

permissions:
  active_idor_testing: false
  hidden_route_probing: true
  authenticated_scanning: true

notes: |
  Source: https://hackerone.com/example-bbp
  Last verified: 2026-05-12

This is not configuration in the usual “default settings” sense. This is a declaration of what the program's rules permit. It changes when the program's rules change, not when my scanning needs change.

Why split it this way

The split exists because the two layers fail differently when you get them wrong.

Getting a CLI flag wrong is recoverable. If I forget --idor, the scan runs without IDOR detection and I lose some signal. If I pass a malformed cookie, the scan fails fast with an auth error. These failures are loud and local — I notice immediately and re-run.

Getting authorization wrong is not recoverable. If I scan a host I'm not authorized to scan, the requests have already been sent. If I enable authenticated scanning on a program that forbids automated authenticated testing, the program's security team sees the traffic and the engagement is over. These failures are silent and global — I might not realize until the program contacts me.

So the design question is: where should the authorization context live, and how should it interact with the per-run flags? Two options I considered:

Option A: authorization as CLI flags. --allow-host=example.com --allow-host=*.example.com --deny-path=/admin --allow-authenticated, and so on. This works for one-off scans but breaks for an ongoing engagement. Across the 20+ scans you'll run against a single program over weeks, you have to type the same authorization flags every time. Eventually you alias them, and the alias is now an undeclared, ungit-tracked authorization context that lives on one machine. If you re-image, switch laptops, or share work with someone else, the rules go with you instead of with the engagement.

Option B: authorization as a per-engagement file, referenced by name. The scope file lives in version control alongside the tool. The CLI references it with -w <program>. Per-run flags can't override per-engagement authorization — if the scope file says authenticated_scanning: false, passing --auth-header on the command line fails with a scope-violation error before any HTTP traffic leaves the machine.

GhostMap uses Option B. The CLI is for run-time choices. The scope file is for engagement-level rules. Run-time choices that try to override engagement-level rules get rejected at startup.

What this prevents in practice

A few situations from real engagements where the split has mattered:

Cross-program credential leakage. If I'm working on two programs and switch contexts, I might still have $AUTH_TOKEN exported in my shell from the previous program. With CLI-flag authorization, that token would silently be used on the next program's scan. With scope-file authorization, the new program's scope file specifies whether authenticated scanning is permitted — and if it isn't, the scan refuses to use the credential even when I pass it.

Out-of-scope host drift. Modern bug bounty programs often have 40+ in-scope assets and similar numbers of out-of-scope exclusions. A wildcard like *.example.com covers the main app but might also match internal subdomains the program explicitly excludes. The scope file's out_of_scope.hosts list catches these. The crawler discovers links the page references, but out-of-scope links never get fetched — the request never leaves my machine.

Active behavior gating. Some programs forbid automated identifier mutation (active IDOR testing). The IDOR candidate detector is passive — it observes traffic and notes URLs with identifier-looking parameters. The IDOR auto-tester would actively mutate those identifiers and replay the request. The scope file's permissions.active_idor_testing gates the auto-tester only. So passive recon stays available across programs while active mutation is opt-in per engagement.

Where the design got tested

Two things have caught me with this design in real use.

First, the workspace's scope file and the project's scope file can drift. When you initialize a workspace, the tool copies scopes/<program>.yaml into workspaces/<program>/scope.yaml. If you edit the source later, the workspace copy goes stale. I had scans refuse authenticated mode even after I'd updated the scope file — because the workspace was reading its own out-of-date copy. The fix was a ghostmap workspace sync-scope <program> command that re-syncs from source. But the underlying lesson was: every artifact derived from a source of truth needs an explicit, named path back to that source.

Second, exact-string host matching is too strict and wildcard matching is too loose. remitly.com as an exact-match scope entry doesn't match www.remitly.com, even though they're the same site. *.remitly.com matches the main app but also careers.remitly.com and support.remitly.com, which may or may not be in the program's actual scope. The right answer is to list the in-scope hosts explicitly — apex and www form for any site that uses both, no wildcards unless the program's scope page genuinely covers the entire subdomain space. Annoying, but it forces you to read the program's scope page carefully, which is the point.

The principle, generalized

The pattern that came out of building this: when a tool has decisions that affect different timescales, those decisions belong in different surfaces. CLI flags are for per-invocation choices that change every time you run the command. Configuration files are for engagement-level rules that stay constant across many invocations. Trying to do both with one surface forces you to either re-type rules constantly or build alias systems that hide the rules from version control.

GhostMap's split: flags for what changes per scan, scope files for what changes per program. If I'm ever tempted to add an --override-scope flag, the answer is no — the right move is to edit the scope file, re-sync the workspace, and re-run. The friction is the feature.

GhostMap is open source at github.com/andrewliera/ghostmap. MIT licensed.