Skip to Content
WhatsApp MCPPlan & Phases6 — Media And Interactive

Phase 6 — Media + Interactive Messages

Effort: L

Goal

Send and receive image / document / audio media, and send interactive (button / list) messages and handle their replies. Media is stored on local disk with a signed-URL access path; Nginx never serves media directly.

Deliverables

Inbound media flow

  • src/inngest/functions/process-message.ts — when message.type ∈ {image, document, audio, video, sticker}, emit wa/media.download.requested with { clientId, phoneNumberId, mediaId, wamid }.
  • src/inngest/functions/download-media.ts:
    1. GET https://graph.facebook.com/v23.0/{mediaId} → temporary signed URL (Meta URL expires in ~5 min).
    2. Stream download into MEDIA_ROOT/<phone_number_id>/<yyyy>/<mm>/<uuid>.<ext> with mode 0640. The file UUID is generated locally; never use Meta’s id as a filename.
    3. Compute SHA-256 streaming; record mime_type, size_bytes, sha256, storage_path (relative).
    4. Insert media_objects row.
    5. Update the messages row’s media_object_id.
    6. Idempotent on (wamid, wa_media_id) — re-runs are no-ops.
  • Mime sniff: validate the first N bytes against the Meta-reported mime_type (e.g. via the file-type package). On mismatch, log and store with mime_type = sniffed, set a metadata.mime_mismatch = true flag for audit.
  • Size guards: enforce Meta’s documented inbound limits at the file-stream level; abort the download if size exceeds the type’s cap (image 5 MB, document 100 MB, audio 16 MB, video 16 MB, sticker 500 KB).

Outbound media flow

  • src/meta/upload-media.tsPOST /{phone_number_id}/media multipart upload. Returns Meta’s media_id.
  • src/meta/send-media.tsPOST /{phone_number_id}/messages with type ∈ {image, document, audio, video} and either { id: <meta_media_id> } (after upload) or { link: <url> }.
  • src/inngest/functions/upload-media.ts — Phase 3 stub fleshed out. Streams from localPath to Meta, returns media_id. Used by send_media when the source is a local file.
  • src/tools/send-media.ts:
    • Input zod:
      { to: string, kind: 'image' | 'document' | 'audio' | 'video', source: { url: string } | { localPath: string } | { mediaId: string }, caption?: string, // image/document/video only filename?: string, // document only, max 240 chars phoneNumberId?: string }
    • Validates type-specific size + mime constraints before upload.
    • For localPath: streams via upload-media function first; for url: passes Meta a link (Meta does the fetch).
    • Persists out row + media_objects row.
    • Requires scopes tools:send_media + media:write.

Signed-URL access

  • src/media/signed-url.ts:
    • sign(mediaId)https://wa.<yourdomain>/media/<mediaId>?exp=<unix>&sig=<hex> with sig = HMAC-SHA256(MEDIA_SIGNING_SECRET, mediaId|exp).
    • verify(url params) → boolean + structured error.
    • TTL from MEDIA_URL_TTL_SECONDS (default 300).
  • src/transport/http-media.tsGET /media/:id:
    1. Verify HMAC signature.
    2. Check exp not in the past.
    3. Resolve media_objects row.
    4. Tenancy check: the caller’s auth context’s clientId must match the media’s owner (look up messages.client_id via the link). Even with a valid signature, cross-tenant access is refused.
    5. Stream the file from MEDIA_ROOT/<storage_path> with the right Content-Type, Content-Length, and Content-Disposition.
  • src/tools/get-media-url.ts — input { mediaId: string }. Verifies the caller’s grant on the media’s phone_number_id. Returns the signed URL.
  • Nginx (Phase 7) never has a location /media block. All access flows through Express → the auth + signature pipeline.
  • Signed URLs are NOT bearer-shareable. Even with a valid sig and exp, GET /media/:id still requires a Bearer API key in the Authorization header AND verifies that the caller’s clientId matches the media’s owning tenancy. A client cannot hand a signed URL to a teammate who has no key. The signature only proves “this URL was minted by us recently”; the auth header proves “this caller is allowed”. Document this in docs/api/mcp-tools.md so clients don’t expect link-shareability.

Interactive messages

  • src/meta/send-interactive.ts — POSTs type: interactive with button or list shape per Meta spec.
  • src/tools/send-interactive-buttons.ts:
    • Input zod: { to, header?: string, body: string, footer?: string, buttons: Array<{ id: string, title: string }> (max 3), phoneNumberId? }.
    • Validates: ≤ 3 buttons; titles ≤ 20 chars; ids ≤ 256 chars and unique within the message.
  • src/tools/send-interactive-list.ts:
    • Input zod: { to, header?, body, footer?, buttonText: string, sections: Array<{ title?: string, rows: Array<{ id, title, description? }> }>, phoneNumberId? }.
    • Validates Meta’s structural limits (≤ 10 sections, ≤ 10 rows total).
  • src/webhook/normalise.ts — extended to handle interactive replies (button_reply, list_reply). Stores selected id in messages.payload.selectedId and a friendly preview in messages.body ("User picked: <title>").

Retention

  • src/inngest/functions/prune-media.ts — daily cron. For media_objects rows older than 90 days (configurable):
    • Delete the file from disk.
    • Nullify storage_path, set metadata.pruned_at.
    • Row is kept so audit references still resolve.

Docs (extended)

  • docs/architecture/media.md — full design: inbound download flow, outbound upload, signed-URL contract, on-disk layout, retention, mime sniffing, why Nginx never serves media.
  • docs/api/mcp-tools.md — extended with send_media, send_interactive_buttons, send_interactive_list, get_media_url. Each with full input schemas and examples.
  • docs/api/webhook-payloads.md — extended with media and interactive payload shapes.

Critical files

Tests

Unit

  • tests/unit/media/signed-url.test.ts:
    • sign + verify round-trip.
    • Expired URL fails.
    • Tampered signature fails (constant-time).
    • Wrong exp fails.
    • 100% coverage required.
  • tests/unit/media/storage.test.ts:
    • Path construction MEDIA_ROOT/<phone>/<yyyy>/<mm>/<uuid>.<ext> is correct.
    • Path traversal: a malicious storage_path like ../../etc/passwd is rejected.
  • tests/unit/tools/interactive-validation.test.ts:
    • Button count > 3 rejected.
    • Title > 20 chars rejected.
    • List section count > 10 rejected.

Integration (testcontainers Postgres + memfs for media)

  • tests/integration/media/download.test.ts:
    • Webhook inbound with image → wa/media.download.requested emitted → file lands at the expected path → row in media_objectsmessages.media_object_id set.
    • Re-run with same wamid → no duplicate file, no duplicate row.
    • Meta-reported mime mismatching sniffed mime → row stored with metadata.mime_mismatch = true.
  • tests/integration/media/upload.test.ts:
    • send_media with source.localPath → multipart upload to Meta (mocked), media row inserted with both Meta media_id and local path (the local copy is kept for audit).
    • send_media with source.url → Meta link mode; no upload, no local copy.
  • tests/integration/media/signed-url.test.ts:
    • get_media_url returns URL; GET /media/:id?... 200 with correct content.
    • URL after MEDIA_URL_TTL_SECONDS → 403.
    • Tampered sig → 403.
    • Cross-tenant: client B with valid signature for client A’s media → 403 (tenancy check).
  • tests/integration/interactive/send-buttons.test.ts:
    • Send 3-button → Meta receives correct payload; row inserted.
    • Reply with button_reply → row inserted with payload.selectedId = the button id.
  • tests/integration/interactive/send-list.test.ts:
    • Similar to buttons; list_reply handling.
  • tests/integration/retention/prune-media.test.ts:
    • Seed 100d-old media → cron runs → file deleted from disk; row kept with nulled storage_path.

Coverage

  • src/media/signed-url.ts at 100%.
  • Phase total ≥ 80%.

Code documentation

  • TSDoc with @remarks on:
    • download-media.ts (idempotency, size guards, mime sniff, file naming policy).
    • signed-url.ts (HMAC contract, TTL, why tenancy check is still required).
    • http-media.ts (auth + signature + tenancy ordering, why no Nginx direct-serve).
    • send-media.ts, send-interactive-*.ts (Meta payload constraints, validation rules).
    • normalise.ts (interactive reply normalisation rules).
  • docs/architecture/media.md complete.
  • docs/api/mcp-tools.md, docs/api/webhook-payloads.md extended.
  • docs/reference/ regenerated.

Acceptance

  1. Inbound image — phone sends an image → messages row with media_object_id set; get_media_url returns a short-lived URL; curl of that URL fetches the image.
  2. Outbound image (localPath)send_media with a PDF on the host file system → recipient receives it on WhatsApp.
  3. Outbound image (url)send_media with a public URL → recipient receives it (Meta fetches).
  4. Interactive round tripsend_interactive_buttons with 3 options → button press on the phone → messages row inserted with payload.selectedId = the pressed button’s id.
  5. Cross-tenant block — client B with a valid signed URL constructed for client A’s media → 403 from GET /media/:id.
  6. Path traversal block — manually attempting GET /media/../../etc/passwd returns 403/404 with no filesystem read.
  7. Media retention — manually backdated media_objects row + dummy file → cron prunes file, keeps row.
  8. pnpm test:ci green; coverage gates met.

Notes

  • Files on disk live at /var/lib/whatsapp-mcp/media/<phone_number_id>/<yyyy>/<mm>/<uuid>.<ext>. Owner whatsapp-mcp, mode 0640. Directory never web-served. storage_path in DB is always relative.
  • Templates remain deferred to phase-8-future.md. send_message to a user outside the 24h window still returns OutOfSessionWindowError from Phase 2.

Definition of Done

Inbound media

  • process-message emits wa/media.download.requested for media types.
  • download-media streams to MEDIA_ROOT/<phone>/<yyyy>/<mm>/<uuid>.<ext> with mode 0640.
  • Mime sniff against Meta-reported type; mismatch flagged in metadata.
  • Size guards per type enforced at stream level.
  • Idempotent on (wamid, wa_media_id).
  • media_objects row inserted; messages.media_object_id set.

Outbound media

  • src/meta/upload-media.ts multipart upload returning Meta media_id.
  • src/meta/send-media.ts supports id, link, and (via upload-then-send) localPath.
  • send_media tool with full zod input (kind / source / caption / filename / phoneNumberId).
  • Pre-upload size + mime validation.
  • Scopes tools:send_media + media:write enforced.

Signed URLs + media route

  • src/media/signed-url.ts sign/verify with HMAC + exp.
  • GET /media/:id runs auth + signature + tenancy check (in that order).
  • Path-traversal rejection (relative-path only; reject ..).
  • get_media_url tool returns a short-lived URL.
  • No location /media block planned in Nginx (Phase 7).

Interactive

  • send_interactive_buttons (≤3 buttons; title ≤20 chars).
  • send_interactive_list (≤10 sections; total ≤10 rows).
  • Webhook normaliser handles button_reply + list_replypayload.selectedId + preview body.

Retention

  • prune-media cron deletes files > 90d; row kept with nulled storage_path.

Tests

  • tests/unit/media/signed-url.test.ts at 100% coverage.
  • tests/unit/media/storage.test.ts (path construction + traversal block).
  • tests/unit/tools/interactive-validation.test.ts.
  • tests/integration/media/download.test.ts (happy + replay + mime mismatch).
  • tests/integration/media/upload.test.ts (localPath + url).
  • tests/integration/media/signed-url.test.ts (happy + expired + tampered + cross-tenant).
  • tests/integration/interactive/send-buttons.test.ts.
  • tests/integration/interactive/send-list.test.ts.
  • tests/integration/retention/prune-media.test.ts.
  • Coverage: signed-url.ts = 100%; phase ≥ 80%.

Documentation

  • docs/architecture/media.md written.
  • docs/api/mcp-tools.md extended (send_media, send_interactive_*, get_media_url).
  • docs/api/webhook-payloads.md extended (media + interactive).
  • TSDoc @remarks on download-media, signed-url, http-media, send-media, send-interactive-*, normalise.
  • docs/reference/ regenerated cleanly.

Acceptance verified

  • Inbound image → messages row with media; get_media_url returns URL; curl downloads correctly.
  • Outbound localPath (PDF) → recipient receives.
  • Outbound url → recipient receives.
  • Interactive button round-trip records payload.selectedId.
  • Cross-tenant signed URL → 403.
  • Path traversal attempt → 403/404 with no fs read.
  • Backdated row → cron prunes file, keeps row.

Phase signoff

  • Phase 6 complete. README.md status table updated to ✅.