Apr 4 2026
driftd: A Self-Hosted Terraform Drift Detection Daemon
How I built a single-binary Go daemon that detects configuration drift between Terraform state and live AWS/Cloudflare infrastructure, with an embedded React UI and scheduled scans.
driftd: A Self-Hosted Terraform Drift Detection Daemon
Terraform is fantastic until reality diverges from state. Someone clicks through the AWS console. An auto-scaling policy changes an instance type. A DNS record gets edited manually. The state file says one thing, the actual infrastructure says another — and you only find out when something breaks.
I built driftd to catch this quietly before it becomes a problem. It's a single Go binary that runs as a daemon, periodically scans your Terraform state against live AWS and Cloudflare resources, and surfaces any drift through a web UI, REST API, and Slack notifications.
The Problem with Drift
If you've run Terraform in production for any length of time you've seen it: the dreaded "changes detected on refresh" output that shows up unexpectedly. In a solo setup this is annoying. In a team environment with multiple engineers, multiple environments, and a mix of Terraform-managed and manually-adjusted resources — it compounds fast.
The existing solutions either require a full Terraform Cloud subscription, a heavy observability stack, or a CI job that runs terraform plan against every workspace on a schedule (which means storing credentials, managing state backend access, and waiting for plan execution times).
I wanted something simpler: a daemon that understands Terraform state files and knows how to query the matching resources directly.
Architecture: One Binary, Everything Inside
The entire tool ships as a single statically-linked Go binary. There's no separate database server, no frontend deployment step, no sidecar. Everything is embedded:
driftd (single binary)
├── Cobra CLI — serve, scan, workspace, version commands
├── SQLite database — scan history, drift results, workspace config
├── HTTP API — REST endpoints at /api/v1
├── React frontend — embedded via go:embed from ui/dist
└── Scheduler — cron-based scan dispatcher
The internal structure follows a clean layered approach:
internal/
├── config/ Viper-based YAML configuration
├── database/ SQLite init + schema migrations
├── models/ Domain types (Workspace, ScanRun, DriftResult)
├── store/ CRUD layer over SQLite using sqlx
├── api/ HTTP handlers using stdlib net/http
├── scanner/ Core drift detection logic
│ ├── state.go State file readers (S3 + local)
│ ├── fetcher.go AWS resource fetchers
│ ├── diff.go Comparison with smart field ignoring
│ └── cloudflare_fetcher.go
├── scheduler/ Cron-based scan scheduling
└── notifier/ Slack + webhook notifications
Key Technology Decisions
A few choices that shaped the design:
Pure Go SQLite (modernc.org/sqlite) — no CGO, no C compiler required, cross-compiles cleanly. The driver name is "sqlite" (not "sqlite3"). Paired with sqlx for raw SQL instead of an ORM — gives full control over queries and makes the data model explicit.
Single writer enforced: db.SetMaxOpenConns(1) plus WAL mode. SQLite is single-writer by design; enforcing this at the connection pool level prevents mysterious write conflicts.
ULIDs for IDs — sortable, URL-safe, timestamp-embedded. Querying by creation time works naturally without extra ORDER BY clauses on a separate timestamp column.
Interface-based fetchers — each resource type implements:
type AWSFetcher interface {
ResourceType() string
Fetch(ctx context.Context, resourceID string, stateAttrs map[string]any) (map[string]any, error)
}
Adding a new resource type means implementing this interface and registering it — no changes to the core scanner loop.
Stdlib net/http — no Gin, no Echo, no Chi. For a daemon with a handful of REST endpoints the standard library is sufficient and removes a dependency.
How a Scan Works
The scan loop is straightforward:
- Read state file — from a local path or S3 object, depending on workspace configuration
- Walk managed resources — iterate every
resourceblock in the state - Fetch live attributes — for each resource type driftd supports, query the cloud provider directly
- Compare — diff the state attributes against the live attributes, ignoring computed fields that are expected to vary
- Store results — persist each resource's status (
in_sync,drifted, ordeleted) with a timestamp and the diff payload - Notify — if drift is found and notifications are configured, fan out to Slack and/or webhooks
The comparison step uses per-resource-type ignore lists to avoid noise. For EC2 instances, fields like arn, public_dns, and public_ip are excluded — these are computed by AWS and will always differ from what's in state after a plan/apply cycle.
Supported Resources
AWS
| Terraform Type | Compared Attributes |
|---|---|
aws_instance | instance_type, ami, availability_zone, tags |
aws_s3_bucket | bucket, region |
aws_security_group | name, description, vpc_id |
aws_db_instance | instance_class, engine, engine_version, db_instance_status |
aws_vpc | cidr_block, state, is_default |
Cloudflare
Cloudflare fetchers activate automatically when credentials are present in the environment:
export CLOUDFLARE_API_TOKEN=<scoped-token> # preferred
# or
export CLOUDFLARE_API_KEY=<global-key>
export CLOUDFLARE_EMAIL=<account-email>
| Terraform Type | Compared Attributes |
|---|---|
cloudflare_record | name, type, content, proxied, ttl, comment |
cloudflare_ruleset | name, kind, phase, description |
cloudflare_zone_settings_override | full settings block |
Quick Start
# Build (requires Node.js for the embedded frontend)
make build
# Copy and edit the config
cp driftd.yaml.example driftd.yaml
# Start the server (binds to :8080 by default)
./driftd serve
# Open the UI
open http://localhost:8080
Add a Workspace
# Terraform state stored in S3
./driftd workspace add \
--name production \
--source-type s3 \
--source-config '{"bucket":"my-tfstate","key":"prod/terraform.tfstate","region":"us-east-1"}' \
--region us-east-1 \
--schedule "@every 6h"
# Local state file
./driftd workspace add \
--name local-env \
--source-type local \
--source-config '{"path":"/path/to/terraform.tfstate"}' \
--region us-east-1
Run a One-Shot Scan
./driftd scan --workspace production
Configuration
server:
port: 8080
host: "0.0.0.0"
database:
path: "./driftd.db"
log:
level: "info" # debug, info, warn, error
aws:
region: "us-east-1"
# profile: "my-profile" # optional named profile
notifications:
slack_webhook_url: "https://hooks.slack.com/services/..."
webhook_url: "https://your-receiver.example.com/hook"
on_drift: true # notify when drift is detected
on_delete: true # notify when a resource disappears
on_all_scans: false # notify after every scan regardless of result
AWS credentials follow the standard SDK chain: environment variables → ~/.aws/credentials → IAM instance profile. No credentials are ever stored in the config file or database.
Scan Scheduling
Each workspace can have its own schedule. driftd accepts standard cron expressions and the robfig/cron descriptors:
@every 15m every 15 minutes
@every 6h every 6 hours
@daily once a day at midnight
0 */6 * * * every 6 hours (standard cron)
An empty schedule means manual-only. The scheduler tracks running scans per workspace to avoid concurrent runs — if a scan is already in flight when the next trigger fires, the new invocation is skipped and logged.
REST API
All endpoints are under /api/v1. Responses use a consistent envelope:
{ "data": { ... } }
Errors return:
{ "error": "descriptive message" }
| Method | Path | Description |
|---|---|---|
GET | /workspaces | List workspaces |
POST | /workspaces | Create workspace |
GET | /workspaces/:id | Get workspace |
PUT | /workspaces/:id | Update workspace |
DELETE | /workspaces/:id | Delete workspace |
GET | /workspaces/:id/scans | Scan history for a workspace |
GET | /scans/:id | Get a specific scan run |
GET | /scans/:id/results | Drift results for a scan |
GET | /health | Health check |
Deleting a workspace cascades — all scan runs and drift results for that workspace are removed automatically via SQLite foreign key constraints.
Notifications
The notification pipeline uses a composite MultiNotifier that fans out to all configured receivers. A single notifier failing (e.g., Slack webhook returning 5xx) is logged as a warning and doesn't prevent other notifiers from firing.
Trigger conditions are evaluated once per scan:
on_drift: true— at least one resource hasdriftedstatuson_delete: true— at least one resource hasdeletedstatus (no longer exists in the live account)on_all_scans: true— always notify, including clean scans
The Slack payload includes the workspace name, scan timestamp, and a formatted list of drifted resources with their attribute differences.
What's Not Included (Intentionally)
driftd is deliberately scoped. It does not:
- Run
terraform plan— no Terraform binary required, no state locking during scans - Apply changes — read-only, never modifies infrastructure
- Store cloud credentials — uses the standard AWS SDK chain and environment variables for Cloudflare
- Require a cloud database — SQLite is sufficient for the scan volumes a typical team produces
The goal was a tool I could drop on a small VM or into a Kubernetes pod and forget about, not another service that needs its own managed database and credential rotation pipeline.
CLI Reference
driftd serve Start the HTTP server and scheduler
driftd scan --workspace <name> Run a one-shot scan
driftd workspace list List all workspaces
driftd workspace add --name ... Create a workspace
driftd workspace delete <name> Delete a workspace
driftd version Print version
Use --config / -c on any command to point at a non-default config file.
The repository is at github.com/georg-nikola/driftd. Contributions welcome — especially additional resource fetchers for other AWS resource types or cloud providers.