View as markdown

# Testing

clawpatrol test is a regression-test CLI for policy changes. It replays recorded gateway actions against a candidate HCL policy and tells you whether any verdict drifted — a deny that’s now allow, a pg-reads rule that no longer fires, an endpoint default that quietly changed.

It’s a pure CLI: no gateway, no database, no auth. Drop the binary into CI and run it on every push.

clawpatrol test <config.hcl> <fixture.json | fixture-dir>

Exit 0 when every fixture matches; 1 on any mismatch or fixture load error; 2 on usage or config-load error.

# A minimal end-to-end example

Drop these two files in a directory:

github.hcl — a tiny policy that allows GitHub reads and denies writes:

gateway {
  state_dir  = "/opt/clawpatrol"
  public_url = "https://gw.example.test"

  wireguard {
    subnet_cidr = "10.55.0.0/24"
  }
}

credential "bearer_token" "github_pat" {
  endpoint = http.github
}

endpoint "http" "github" {
  hosts = ["api.github.com"]
}

rule "github-reads" {
  endpoint  = http.github
  condition = "http.method in ['GET', 'HEAD']"
  verdict   = "allow"
}

rule "github-writes" {
  endpoint  = http.github
  condition = "http.method in ['POST', 'PATCH', 'PUT', 'DELETE']"
  verdict   = "deny"
  reason    = "writes go through PR review"
}

profile "default" { credentials = [bearer_token.github_pat] }

fixtures/get-user.json — assert that GET /user is allowed:

{
  "action": {
    "host": "api.github.com",
    "http": {
      "method": "GET",
      "path": "/user",
      "headers": { "Authorization": ["***"] }
    }
  },
  "match": {
    "verdict": "allow",
    "rule": "github-reads",
    "endpoint": "http.github"
  }
}

Run it:

$ clawpatrol test github.hcl fixtures/
ok   fixtures/get-user.json
1 action(s) checked, 0 mismatch(es)

Now break it. Edit github.hcl and flip github-reads' verdict from "allow" to "deny". Re-run:

$ clawpatrol test github.hcl fixtures/
FAIL fixtures/get-user.json
  want verdict="allow"      rule="github-reads"                 endpoint="http.github"
  got  verdict="deny"       rule="github-reads"                 endpoint="http.github"
1 action(s) checked, 1 mismatch(es)
$ echo $?
1

That’s the whole loop.

# Workflow

Authoring fixtures by hand is fine for the smoke-test corpus above, but in practice you record them from real traffic:

  1. Run a gateway locally against the policy you want to regression-test:

    clawpatrol gateway github.hcl
  2. Send real requests through it. Mix verdicts — drive the allow rules, drive the deny rules, drive any approver chains — so the corpus covers every comparison branch you care about.

    curl -H "Authorization: Bearer $GITHUB_TOKEN" https://api.github.com/user
    curl -X DELETE -H "Authorization: Bearer $GITHUB_TOKEN" \
      https://api.github.com/repos/me/sandbox/issues/1
  3. Click "Download action" on each row’s detail page in the dashboard. The browser saves a single .json file per action, already in the right format.

    Download action button on the request detail page

  4. Drop the files into a fixtures directory and check them into your repo:

    .
    ├── github.hcl
    └── fixtures/
        ├── get-user.json
        └── delete-issue.json
  5. Run clawpatrol test and expect 0 mismatches.

  6. Make a policy change and re-run. If a verdict moved, the runner prints the affected fixture and the want / got diff (like the example above).

The same fixtures become CI’s regression set on every push.

# CI integration

The simplest possible GitHub Actions step:

- name: Policy regression tests
  run: |
    curl -fsSL https://clawpatrol.dev/install.sh | sh
    clawpatrol test github.hcl fixtures/

The exit code does the work — non-zero fails the job and the diff shows up in the log.

# Fixture format

Each fixture has two top-level keys: action is the recorded request (what the agent did); match is the assertion (what the rule engine should produce for that action). Exactly one facet block (http / k8s / sql) lives under action, carrying that facet’s vocabulary — the same fields your CEL rule conditions read.

# HTTPS

{
  "action": {
    "host": "api.github.com",
    "credential": "github_pat",
    "peer_ip": "100.64.0.7",
    "http": {
      "method": "DELETE",
      "path": "/repos/me/sandbox/issues/1",
      "headers": { "Authorization": ["***"] }
    }
  },
  "match": {
    "verdict": "deny",
    "rule": "github-writes",
    "endpoint": "http.github",
    "reason": "writes go through PR review"
  }
}

# Kubernetes

{
  "action": {
    "host": "10.0.0.7",
    "k8s": {
      "verb": "get",
      "resource": "secrets",
      "namespace": "default",
      "name": "ci-deploy-key"
    }
  },
  "match": {
    "verdict": "deny",
    "rule": "no-secrets",
    "endpoint": "kubernetes.k8s-dev"
  }
}

# SQL

{
  "action": {
    "host": "pg-staging.internal:5432",
    "sql": { "statement": "SELECT id, name FROM workflows WHERE id = 1" }
  },
  "match": {
    "verdict": "allow",
    "rule": "pg-reads",
    "endpoint": "postgres.pg-staging"
  }
}

For SQL, only statement is required — the runner derives verb, tables, and functions from the SQL the same way the live dispatch path does. You can override them by adding explicit fields if you want to test the matcher’s view directly.

# Shared hosts: pinning the endpoint

If two endpoints both claim the same host — common with api.anthropic.com, where you might route Claude Code and a custom agent through different rule sets — set match.endpoint explicitly:

{
  "action": {
    "host": "api.anthropic.com",
    "http": { "method": "POST", "path": "/v1/messages" }
  },
  "match": {
    "verdict": "approve",
    "rule": "anthropic-default",
    "endpoint": "http.anthropic-agent-A"
  }
}

Without match.endpoint, the runner sees an ambiguous host and errors:

FAIL fixtures/anthropic.json: host "api.anthropic.com" is claimed
by multiple endpoints [anthropic-agent-A anthropic-agent-B]; set
`match.endpoint` to disambiguate

# Reference

# match

approve is terminal: a rule routing to an approver chain records match.verdict = "approve". The human’s eventual allow/deny is out of scope for replay.

# action

# Facet vocabulary

BlockFields
httpmethod, path, query, headers, body, body_b64
k8sverb, resource, namespace, name, params
sqlstatement (required); verb, tables, functions (optional, derived from statement if omitted)

Every field is optional except SQL’s statement. Missing fields default to zero values — rules that match on them just return false. Fixtures that include the full struct are accepted; explicit values take precedence over derivation.

# Conventions

# Redaction

The exporter reads from the dashboard’s SQLite store. Whatever redaction the recording sink applied is what the fixture carries.