Skip to Content

Runbook — Incident Response (stub)

This is a stub. The full runbook is promoted to docs/operations/incident-runbook.md in Phase 7.

What this runbook covers

The standard incident playbooks. Each section starts with “symptoms” and ends with “verified resolved”.

Sections

1. Meta access token rotation (planned or reactive)

  • Generate a new System User token in the Meta dashboard.
  • Update wa_access_token_<phone_id> in ops/secrets/secrets.enc.yaml; sops -e -i.
  • systemctl restart whatsapp-mcp — tokens are read at startup.
  • Verify: audit_log shows successful send_attempt rows after restart.

2. Database restore from pg_dump

  • Identify the desired backup file in /var/lib/whatsapp-mcp/backups/db-YYYY-MM-DD.sql.gz.
  • Stop the app: docker compose stop app.
  • Restore: gunzip -c db-<date>.sql.gz | docker compose exec -T postgres psql -U wa -d wa_mcp.
  • Run migrations: docker compose exec app pnpm db:migrate.
  • Start the app: docker compose up -d app.
  • Verify via /health and pnpm smoke.

3. certbot renewal failure

  • Symptom: browser warning about expired cert; openssl s_client shows expired.
  • Check /var/log/letsencrypt/letsencrypt.log for the failure cause.
  • Common cause: DNS misconfiguration or firewall blocked port 80 during ACME challenge.
  • Manual force-renew: certbot renew --force-renewal --webroot -w /var/www/certbot --deploy-hook "docker compose exec nginx nginx -s reload".
  • Verify: openssl s_client -connect wa.<yourdomain>:443 shows new dates.

4. API key pepper rotation

  • Generate new 32-byte pepper.
  • Move the current /run/secrets/api_key_pepper to /run/secrets/api_key_pepper.previous.
  • Write the new pepper to /run/secrets/api_key_pepper.
  • Run admin keys re-hash-all (a one-shot script that reads each row, re-computes hash = HMAC_SHA256(new_pepper, /* original token, which we DO NOT have */)).
  • Key insight: we never store original tokens, so peppered rotation requires re-issuing all keys. Two strategies:
    1. Soft rotation: keep the previous pepper accepted by the auth middleware for a grace window (≤ 30 days), giving every client time to rotate their key via the standard rotation flow.
    2. Hard rotation (emergency, suspected compromise): invalidate all keys immediately by switching peppers without the grace window. Clients are locked out until they receive new keys.
  • Document the choice and notification flow.

5. Full host re-bootstrap (host died, lost everything)

  1. Provision a new Ubuntu host per Phase 0 host-bootstrap.
  2. Install Docker, SOPS, age.
  3. Place the age private key at /etc/whatsapp-mcp/age.key (from the offline backup — paper or hardware token).
  4. Clone the repo into /opt/whatsapp-mcp.
  5. Restore the latest db-<date>.sql.gz and media/ from off-host backup.
  6. systemctl enable --now whatsapp-mcp-secrets whatsapp-mcp.
  7. Update DNS to point at the new host’s IP.
  8. Renew certbot if needed.
  9. Verify via /health and full smoke checklist.

Open items

Each section gets concrete commands with expected output, escalation contacts, and post-incident review template.