Runbook — Incident Response (stub)
This is a stub. The full runbook is promoted to docs/operations/incident-runbook.md in Phase 7.
What this runbook covers
The standard incident playbooks. Each section starts with “symptoms” and ends with “verified resolved”.
Sections
1. Meta access token rotation (planned or reactive)
- Generate a new System User token in the Meta dashboard.
- Update
wa_access_token_<phone_id>inops/secrets/secrets.enc.yaml;sops -e -i. systemctl restart whatsapp-mcp— tokens are read at startup.- Verify:
audit_logshows successfulsend_attemptrows after restart.
2. Database restore from pg_dump
- Identify the desired backup file in
/var/lib/whatsapp-mcp/backups/db-YYYY-MM-DD.sql.gz. - Stop the app:
docker compose stop app. - Restore:
gunzip -c db-<date>.sql.gz | docker compose exec -T postgres psql -U wa -d wa_mcp. - Run migrations:
docker compose exec app pnpm db:migrate. - Start the app:
docker compose up -d app. - Verify via
/healthandpnpm smoke.
3. certbot renewal failure
- Symptom: browser warning about expired cert;
openssl s_clientshows expired. - Check
/var/log/letsencrypt/letsencrypt.logfor the failure cause. - Common cause: DNS misconfiguration or firewall blocked port 80 during ACME challenge.
- Manual force-renew:
certbot renew --force-renewal --webroot -w /var/www/certbot --deploy-hook "docker compose exec nginx nginx -s reload". - Verify:
openssl s_client -connect wa.<yourdomain>:443shows new dates.
4. API key pepper rotation
- Generate new 32-byte pepper.
- Move the current
/run/secrets/api_key_pepperto/run/secrets/api_key_pepper.previous. - Write the new pepper to
/run/secrets/api_key_pepper. - Run
admin keys re-hash-all(a one-shot script that reads each row, re-computeshash = HMAC_SHA256(new_pepper, /* original token, which we DO NOT have */)). - Key insight: we never store original tokens, so peppered rotation requires re-issuing all keys. Two strategies:
- Soft rotation: keep the previous pepper accepted by the auth middleware for a grace window (≤ 30 days), giving every client time to rotate their key via the standard rotation flow.
- Hard rotation (emergency, suspected compromise): invalidate all keys immediately by switching peppers without the grace window. Clients are locked out until they receive new keys.
- Document the choice and notification flow.
5. Full host re-bootstrap (host died, lost everything)
- Provision a new Ubuntu host per Phase 0 host-bootstrap.
- Install Docker, SOPS, age.
- Place the age private key at
/etc/whatsapp-mcp/age.key(from the offline backup — paper or hardware token). - Clone the repo into
/opt/whatsapp-mcp. - Restore the latest
db-<date>.sql.gzandmedia/from off-host backup. systemctl enable --now whatsapp-mcp-secrets whatsapp-mcp.- Update DNS to point at the new host’s IP.
- Renew certbot if needed.
- Verify via
/healthand full smoke checklist.
Open items
Each section gets concrete commands with expected output, escalation contacts, and post-incident review template.