View as markdown

# Architecture

See Glossary for definitions of gateway, endpoint, credential, rule, profile, plugin, runtime, and the rest of the vocabulary used below.

# Overview — actors

Five actors take part in a clawpatrol deployment:

# Process diagram

The gateway is drawn on a separate machine; the device runs only the client — it does not run policy logic, does not hold credentials, and does not know upstream secrets.

device agent claude/codex client capture tunnel (WireGuard) gateway intercept? yes endpoint plugin http/k8s/sql/ssh rule plugin match facets verdict allow deny HITL approver LLM proctor on allow credential plugin inject real secret upstream no transparent relay

The gateway pulls in three plugin families:

# Connection modes

clawpatrol join <gateway> enrolls the device. What the gateway mints + what the client installs depends on the gateway’s control mode.

# Tailscale mode

The gateway embeds tsnet; it joins the tailnet in-process and exposes only /api/onboard/{start,poll,claim} + /api/cred/* on :443 via Funnel. Every other route is tailnet-only. At onboard the gateway mints a Tailscale auth key (reusable=true, ephemeral=true for per-process; ephemeral=false for --whole-machine) via OAuth and the CA + api-token are delivered inside the approved Funnel response.

clawpatrol run -- <cmd> (Linux + macOS). Each invocation is its own ephemeral tailnet node. On Linux a new user + net + mnt namespace runs userspace wireguard-go inside tsnet.Server with MkdirTemp state (Ephemeral: true); on macOS the NETransparentProxyProvider extension hosts the tsnet stack and PPID-filters flows. Concurrent runs on one host don’t share state. Reference: run_tsnet_linux.go, run_tsnet_darwin.go, macos/netstack/wgnetstack.go.

The persisted tsnet auth key is hidden from agent processes:

Net effect: the bearer is bound to "code running on this physical machine," not "anyone who can copy the file off-box."

clawpatrol join --whole-machine (Linux). Installs system Tailscale (tailscale up --authkey=...), sets the gateway as the exit node, and routes the whole host through. The auth key for this path is minted with ephemeral=false so the node persists. Reference: setup.go:runLogin.

clawpatrol join --whole-machine (macOS). The NE owns whole- host routing — no system Tailscale touched. macOS never runs system tailscaled.

# WireGuard mode

The gateway runs an in-process WireGuard server (wireguard-go + gVisor netstack). At onboard it mints a keypair, allocates a /32 from gateway.wireguard.subnet_cidr, and persists the wg-quick config at ~/.config/clawpatrol/wg.conf.

clawpatrol run -- <cmd> (Linux). Per-process ephemeral WG peer in a fresh netns. Reference: run_linux.go.

clawpatrol join --whole-machine (Linux). Kernel WireGuard via wg-quick up. Default route flips to the WG tunnel. Reference: setup.go:wgQuickUp, wireguard.go.

clawpatrol run -- <cmd> (macOS). WG userspace inside the NE, PPID-filtered. Reference: run_darwin.go, macos/ClawpatrolExtension/Provider.swift.

# Network traffic processing

Once a flow reaches the gateway over the tunnel, the gateway inspects the destination port (and, for some families, the SNI or the resolved hostname) to pick a handler. A family is the protocol class an endpoint plugin advertises so the rule engine can target it: today the gateway ships http (the http endpoint), sql (postgres, clickhouse_native, clickhouse_https), and k8s (kubernetes). Rules are a single block kind; the family is inferred from the rule’s endpoint(s) at load time, and each family exposes its own CEL variable (http.*, sql.*, k8s.*) that the rule’s condition may reference. New protocols (e.g. ssh) ship with their own family identifier and CEL variable. Anything the gateway has no opinion on splices to the real upstream byte-for-byte. There is no HTTPS_PROXY env var, no per-tool CA configuration, and no iptables rule on the gateway host: the WG netstack accepts SYNs to any destination IP/port and hands the dispatcher the original 4-tuple intact.

# Dispatch decision

The promiscuous WG forwarder picks one branch per inbound flow based on the destination port and IP:

agent flow arrives dispatch on dst port / IP TCP :443 SNI peek; matched endpoint ⇒ MitM TLS (http / k8s family); no match ⇒ unknown_host policy (passthrough or close) TCP :5432 ConnIndex (DNS-resolved IP) → device profile picks one postgres endpoint ⇒ MitM (sql family); no match ⇒ relay UDP/TCP :53 DNS-VIP responder: known VIP-bound host returns its allocated VIP; everything else is forwarded to the upstream resolver dst is allocated VIP VIP table → endpoint runtime owning the VIP (today: ssh, clickhouse_native reached by hostname) dst IP in ConnIndex direct-IP endpoint runtime (e.g. clickhouse_native bound to a literal cluster IP) otherwise transparent relay (unknown_host = passthrough by default)

The branches are described below, with the summary table at the end of the section.

# TLS SNI

For TCP flows on :443, the gateway peeks the TLS ClientHello to recover the SNI hostname, then looks up the endpoint claiming that host within the device’s profile. If the endpoint is http or k8s, the gateway terminates TLS with a leaf cert minted on the fly (P-256, 30-day validity, in-memory cache, signed by the gateway’s CA), parses the request, runs it through the rule matcher and approve chain, asks the credential plugin to inject the real secret, and round-trips upstream. Endpoints whose family isn’t HTTPS-shaped (e.g. clickhouse_https, schema-only today) fall through to passthrough.

The CA cert is provisioned on the device during onboarding so the agent’s TLS clients trust the minted leaves; the agent never sees the upstream’s real cert.

# Postgres claiming

Postgres endpoints don’t have an SNI to peek, so the gateway claims them by destination IP. The mechanism is the ConnRouter interface in config/runtime/conn_route.go: an endpoint plugin’s body satisfies ConnRouter when it exposes ConnRouteHosts() []string, returning the host:port tuples it claims (db.example.com:5432, …). At policy load the gateway resolves each host via DNS and folds the answers into a ConnIndex keyed dstIP → endpoint(s).

When a TCP connection lands on :5432, the WG forwarder routes it into handlePostgresConn, which consults the index by the connection’s destination IP to pick the matching endpoint. When several endpoints share an IP (writer + readonly aimed at the same RDS instance) the lookup filters by the device’s profile so the right one wins; single-database profiles fall back to "first postgres in profile" without needing DNS at all. The postgres endpoint runtime then performs auth offload and runs the flow through sql-family rule matching with the right credential.

The same ConnRouter mechanism powers clickhouse_native (claimed by direct IP) and ssh (claimed by DNS-VIP); the plugin only has to declare its host tuples and the dispatcher does the rest without main.go having to learn about new families.

# DNS interception → VIP

Some families (ssh, clickhouse_native) have no SNI and no Host header, so the gateway can’t recover the agent-dialed hostname from the wire bytes alone. Their endpoint plugins flag RequiresVIP, and the dnsvip allocator assigns each hostname a stable virtual IP at policy build, persisted to disk so VIPs survive restart.

The gateway runs an in-process DNS responder on UDP/TCP :53. The WG netstack delivers all DNS queries here regardless of the agent’s resolver setting (any port-53 datagram reaches the gateway). For VIP-bound hostnames it returns the allocated VIP; for everything else it forwards the query to the upstream resolver and returns the real A/AAAA verbatim, so unrelated traffic flows unchanged.

When the agent dials the VIP, the WG forwarder routes any port on that IP into the matching endpoint runtime, which recovers the hostname from the VIP table and dispatches into the right plugin (SSH server-toward-agent / SSH client-toward-upstream with auth replay; ClickHouse Hello-packet placeholder swap; …).

# Direct IP

Endpoint plugins can also bind to literal IPs (hosts = ["172.17.0.1"] for an in-cluster ClickHouse). Those skip dnsvip entirely — the agent dials the IP without ever issuing a DNS query. The gateway maintains an index of IP-literal bindings and consults it in the catch-all branch of the dispatcher: if the destination IP claims an endpoint, the flow goes to that endpoint’s runtime; otherwise it falls through to transparent relay.

# Intercept-or-passthrough summary

With the branches explained, the dispatch table reads as a summary:

dst porthandler
:443SNI peek, then HTTPS family dispatch (http / k8s) or passthrough
:5432postgres wire-protocol gateway (auth offload + sql-family rule matching)
:53DNS-VIP responder (UDP and TCP fallback)
any port, dst is VIPVIP-bound endpoint runtime (today: ssh, clickhouse_native reached by hostname)
elsedirect-IP endpoint lookup; falls through to transparent TCP relay when no plugin claims

If no endpoint plugin claims the destination, the gateway falls back to a transparent relay: it dials the real destination IP and pipes bytes both ways. The top-level unknown_host setting in gateway.hcl (passthrough by default) decides what to do when an HTTPS SNI doesn’t match any configured endpoint — splice it unchanged or close it.

UDP dispatch is narrower: only :53 is handled today (DNS-VIP); other UDP datagrams are dropped.