Runbook — Backups & Restore (stub)
This is a stub. The full runbook is promoted to docs/operations/backups.md in Phase 7.
What this runbook covers
What we back up, where, on what cadence, how to restore, and how we verify backups actually work.
Outline
What is backed up
| Item | Source | Destination (local) | Destination (off-host) | Cadence | Retention |
|---|---|---|---|---|---|
Postgres wa_mcp | pg_dump via docker compose exec | /var/lib/whatsapp-mcp/backups/db-YYYY-MM-DD.sql.gz | B2 / S3 via rclone | Nightly | 30 daily, 12 monthly |
| Media files | rsync with --link-dest snapshots | /var/lib/whatsapp-mcp/backups/media-YYYY-MM-DD/ | B2 / S3 | Nightly | 30 daily |
| Audit log archive | rsync from /var/lib/whatsapp-mcp/audit-archive/ | n/a (already on disk) | Off-host log server | Hourly | 5 years |
| Encrypted secrets | git (in the repo, encrypted) | repo | repo remote | On change | git history |
| Age private key | Manual | /etc/whatsapp-mcp/age.key | Offline only (paper / hardware token) | On generation | Forever |
Backup scripts
ops/backups/backup-db.sh— wrapspg_dump+gzip; appends a timestamp; runsfind ... -mtime +N -deletefor retention.ops/backups/backup-media.sh—rsync -aH --link-dest=<yesterday>.ops/backups/sync-offsite.sh—rclone sync /var/lib/whatsapp-mcp/backups/ remote:wa-mcp-backups/. Encrypted with rclone’s crypt backend OR age-encrypted before upload (pick one and document).ops/backups/sync-audit.sh— hourlyrsyncto the off-host log server.
Restore — Postgres
- Identify the desired
db-YYYY-MM-DD.sql.gz. - Stop the app:
docker compose stop app. gunzip -c db-<date>.sql.gz | docker compose exec -T postgres psql -U wa -d wa_mcp.docker compose exec app pnpm db:migrateto apply any newer migrations.- Start the app:
docker compose up -d app. - Verify:
curl https://wa.<yourdomain>/health→ 200; row counts onclients,api_keys,messagesmatch the dump’s manifest.
Restore — Media
- From off-host:
rclone copy remote:wa-mcp-backups/media-<date>/ /var/lib/whatsapp-mcp/media/. - Verify a sample of
media_objects.storage_pathrows have corresponding files on disk.
Verification (the bit teams skip — don’t)
- Quarterly restore drill: spin up a staging compose, restore the latest backup, run the smoke checklist, decommission. Document the date + result in a
backup-drills.mdlog. - Monthly automated check: a cron on the off-host log server greps the latest
pg_dumpfor expected schema names (CREATE TABLE clients,CREATE TABLE messages, etc.) and pages on failure. - Audit log tamper detection: hourly diff between local and off-host audit archive sizes; mismatch pages immediately.
Age key handling
- Generate with
age-keygen -o /etc/whatsapp-mcp/age.key. chmod 0400,chown root:whatsapp-mcp-secrets.- Back up the private key offline — print the file contents on paper, store in a safe; or write to a YubiKey OpenPGP applet; or split with Shamir secret sharing across two trusted parties. Never put the age key in the cloud backups.
- The public key (in
ops/sops/.sops.yaml) is safe to keep in the repo.
Open items
Full runbook will include the exact rclone remote configuration (sanitised), the off-host log server bootstrap, the drill schedule, and the paging targets.