# Testing
clawpatrol test is a regression-test CLI for policy changes. It replays
recorded gateway actions against a candidate HCL policy and tells you whether
any verdict drifted — a deny that’s now allow, a pg-reads rule that no
longer fires, an endpoint default that quietly changed.
It’s a pure CLI: no gateway, no database, no auth. Drop the binary into CI and run it on every push.
clawpatrol test <config.hcl> <fixture.json | fixture-dir>
Exit 0 when every fixture matches; 1 on any mismatch or fixture load error; 2 on usage or config-load error.
# A minimal end-to-end example
Drop these two files in a directory:
github.hcl — a tiny policy that allows GitHub reads and denies writes:
gateway {
state_dir = "/opt/clawpatrol"
public_url = "https://gw.example.test"
wireguard {
subnet_cidr = "10.55.0.0/24"
}
}
credential "bearer_token" "github_pat" {
endpoint = http.github
}
endpoint "http" "github" {
hosts = ["api.github.com"]
}
rule "github-reads" {
endpoint = http.github
condition = "http.method in ['GET', 'HEAD']"
verdict = "allow"
}
rule "github-writes" {
endpoint = http.github
condition = "http.method in ['POST', 'PATCH', 'PUT', 'DELETE']"
verdict = "deny"
reason = "writes go through PR review"
}
profile "default" { credentials = [bearer_token.github_pat] }
fixtures/get-user.json — assert that GET /user is allowed:
{
"action": {
"host": "api.github.com",
"http": {
"method": "GET",
"path": "/user",
"headers": { "Authorization": ["***"] }
}
},
"match": {
"verdict": "allow",
"rule": "github-reads",
"endpoint": "http.github"
}
}
Run it:
$ clawpatrol test github.hcl fixtures/
ok fixtures/get-user.json
1 action(s) checked, 0 mismatch(es)
Now break it. Edit github.hcl and flip github-reads' verdict from "allow"
to "deny". Re-run:
$ clawpatrol test github.hcl fixtures/
FAIL fixtures/get-user.json
want verdict="allow" rule="github-reads" endpoint="http.github"
got verdict="deny" rule="github-reads" endpoint="http.github"
1 action(s) checked, 1 mismatch(es)
$ echo $?
1
That’s the whole loop.
# Workflow
Authoring fixtures by hand is fine for the smoke-test corpus above, but in practice you record them from real traffic:
Run a gateway locally against the policy you want to regression-test:
clawpatrol gateway github.hclSend real requests through it. Mix verdicts — drive the
allowrules, drive thedenyrules, drive any approver chains — so the corpus covers every comparison branch you care about.curl -H "Authorization: Bearer $GITHUB_TOKEN" https://api.github.com/user curl -X DELETE -H "Authorization: Bearer $GITHUB_TOKEN" \ https://api.github.com/repos/me/sandbox/issues/1Click "Download action" on each row’s detail page in the dashboard. The browser saves a single
.jsonfile per action, already in the right format.
Drop the files into a fixtures directory and check them into your repo:
. ├── github.hcl └── fixtures/ ├── get-user.json └── delete-issue.jsonRun
clawpatrol testand expect0 mismatches.Make a policy change and re-run. If a verdict moved, the runner prints the affected fixture and the
want/gotdiff (like the example above).
The same fixtures become CI’s regression set on every push.
# CI integration
The simplest possible GitHub Actions step:
- name: Policy regression tests
run: |
curl -fsSL https://clawpatrol.dev/install.sh | sh
clawpatrol test github.hcl fixtures/
The exit code does the work — non-zero fails the job and the diff shows up in the log.
# Fixture format
Each fixture has two top-level keys: action is the recorded request (what the
agent did); match is the assertion (what the rule engine should produce for
that action). Exactly one facet block (http / k8s / sql) lives under
action, carrying that facet’s vocabulary — the same fields your CEL rule
conditions read.
# HTTPS
{
"action": {
"host": "api.github.com",
"credential": "github_pat",
"peer_ip": "100.64.0.7",
"http": {
"method": "DELETE",
"path": "/repos/me/sandbox/issues/1",
"headers": { "Authorization": ["***"] }
}
},
"match": {
"verdict": "deny",
"rule": "github-writes",
"endpoint": "http.github",
"reason": "writes go through PR review"
}
}
# Kubernetes
{
"action": {
"host": "10.0.0.7",
"k8s": {
"verb": "get",
"resource": "secrets",
"namespace": "default",
"name": "ci-deploy-key"
}
},
"match": {
"verdict": "deny",
"rule": "no-secrets",
"endpoint": "kubernetes.k8s-dev"
}
}
# SQL
{
"action": {
"host": "pg-staging.internal:5432",
"sql": { "statement": "SELECT id, name FROM workflows WHERE id = 1" }
},
"match": {
"verdict": "allow",
"rule": "pg-reads",
"endpoint": "postgres.pg-staging"
}
}
For SQL, only statement is required — the runner derives verb, tables, and
functions from the SQL the same way the live dispatch path does. You can
override them by adding explicit fields if you want to test the matcher’s view
directly.
# Shared hosts: pinning the endpoint
If two endpoints both claim the same host — common with api.anthropic.com,
where you might route Claude Code and a custom agent through different rule sets
— set match.endpoint explicitly:
{
"action": {
"host": "api.anthropic.com",
"http": { "method": "POST", "path": "/v1/messages" }
},
"match": {
"verdict": "approve",
"rule": "anthropic-default",
"endpoint": "http.anthropic-agent-A"
}
}
Without match.endpoint, the runner sees an ambiguous host and errors:
FAIL fixtures/anthropic.json: host "api.anthropic.com" is claimed
by multiple endpoints [anthropic-agent-A anthropic-agent-B]; set
`match.endpoint` to disambiguate
# Reference
#
match
verdict— required. One ofallow,deny,approve,passthrough.passthroughparses but the runner won’t replay it; pin to a terminal verdict or drop the fixture.rule— name of the rule that fired. Empty when no rule matched and the endpoint default was used.endpoint— optional. Typed reference of the formendpoint-type.endpoint-name(e.g.http.github,postgres.pg-staging) — the same addressing model HCL rules use. When set, pins dispatch and asserts the matched endpoint on replay (see "Shared hosts" above). Bare names are rejected because they collide across endpoint types.reason— informational only; the runner doesn’t compare it.
approve is terminal: a rule routing to an approver chain records
match.verdict = "approve". The human’s eventual allow/deny is out of scope for
replay.
#
action
host— the host the agent dialed. Used by the loader for endpoint resolution whenmatch.endpointis absent. Required for SQL (no URL at the wire level).credential,peer_ip— optional, mirror the gateway’s request-level scalars.- Exactly one facet block —
http,k8s, orsql.
# Facet vocabulary
| Block | Fields |
|---|---|
http | method, path, query, headers, body, body_b64 |
k8s | verb, resource, namespace, name, params |
sql | statement (required); verb, tables, functions (optional, derived from statement if omitted) |
Every field is optional except SQL’s statement. Missing fields default to zero
values — rules that match on them just return false. Fixtures that include the
full struct are accepted; explicit values take precedence over derivation.
# Conventions
bodyis raw UTF-8;body_b64is base64. Mutually exclusive.- Headers and query maps are
map<string, list<string>>so the format matches Go’shttp.Headerandurl.Values. - Unknown keys anywhere in the file are load errors. Typos in fixtures should fail loudly.
# Redaction
The exporter reads from the dashboard’s SQLite store. Whatever redaction the recording sink applied is what the fixture carries.
- Headers are redacted. Values of
Authorization,Cookie,X-Api-Key, and similar sensitive headers are replaced with"***"before being persisted, so they ship that way in fixtures too. - Bodies are not redacted. For well-behaved agents the body is what the
agent sent — typically a placeholder like
{{github_pat}}. For agents that inline secrets, the secret is what gets recorded. Review fixture files before committing them.