Skip to Content

Runbook — Backups & Restore (stub)

This is a stub. The full runbook is promoted to docs/operations/backups.md in Phase 7.

What this runbook covers

What we back up, where, on what cadence, how to restore, and how we verify backups actually work.

Outline

What is backed up

ItemSourceDestination (local)Destination (off-host)CadenceRetention
Postgres wa_mcppg_dump via docker compose exec/var/lib/whatsapp-mcp/backups/db-YYYY-MM-DD.sql.gzB2 / S3 via rcloneNightly30 daily, 12 monthly
Media filesrsync with --link-dest snapshots/var/lib/whatsapp-mcp/backups/media-YYYY-MM-DD/B2 / S3Nightly30 daily
Audit log archiversync from /var/lib/whatsapp-mcp/audit-archive/n/a (already on disk)Off-host log serverHourly5 years
Encrypted secretsgit (in the repo, encrypted)reporepo remoteOn changegit history
Age private keyManual/etc/whatsapp-mcp/age.keyOffline only (paper / hardware token)On generationForever

Backup scripts

  • ops/backups/backup-db.sh — wraps pg_dump + gzip; appends a timestamp; runs find ... -mtime +N -delete for retention.
  • ops/backups/backup-media.shrsync -aH --link-dest=<yesterday>.
  • ops/backups/sync-offsite.shrclone sync /var/lib/whatsapp-mcp/backups/ remote:wa-mcp-backups/. Encrypted with rclone’s crypt backend OR age-encrypted before upload (pick one and document).
  • ops/backups/sync-audit.sh — hourly rsync to the off-host log server.

Restore — Postgres

  1. Identify the desired db-YYYY-MM-DD.sql.gz.
  2. Stop the app: docker compose stop app.
  3. gunzip -c db-<date>.sql.gz | docker compose exec -T postgres psql -U wa -d wa_mcp.
  4. docker compose exec app pnpm db:migrate to apply any newer migrations.
  5. Start the app: docker compose up -d app.
  6. Verify: curl https://wa.<yourdomain>/health → 200; row counts on clients, api_keys, messages match the dump’s manifest.

Restore — Media

  1. From off-host: rclone copy remote:wa-mcp-backups/media-<date>/ /var/lib/whatsapp-mcp/media/.
  2. Verify a sample of media_objects.storage_path rows have corresponding files on disk.

Verification (the bit teams skip — don’t)

  • Quarterly restore drill: spin up a staging compose, restore the latest backup, run the smoke checklist, decommission. Document the date + result in a backup-drills.md log.
  • Monthly automated check: a cron on the off-host log server greps the latest pg_dump for expected schema names (CREATE TABLE clients, CREATE TABLE messages, etc.) and pages on failure.
  • Audit log tamper detection: hourly diff between local and off-host audit archive sizes; mismatch pages immediately.

Age key handling

  • Generate with age-keygen -o /etc/whatsapp-mcp/age.key.
  • chmod 0400, chown root:whatsapp-mcp-secrets.
  • Back up the private key offline — print the file contents on paper, store in a safe; or write to a YubiKey OpenPGP applet; or split with Shamir secret sharing across two trusted parties. Never put the age key in the cloud backups.
  • The public key (in ops/sops/.sops.yaml) is safe to keep in the repo.

Open items

Full runbook will include the exact rclone remote configuration (sanitised), the off-host log server bootstrap, the drill schedule, and the paging targets.