Cache service

Source: src/cache/ — cache.service.ts, ttl-config.ts, metrics.ts.

TTL cache for adapter results, with pg_advisory_xact_lock-based thundering-herd protection and per-transaction RLS context.

Public API


readOrFetch<T>(
  key: CacheKey,
  fetcher: () => Promise<T>,
  opts?: { force?: boolean },
): Promise<{ data: T; cache: 'hit' | 'miss' }>

CacheKey is { tenantId, platform, reportType, dateRangeKey }. On cache hit, the fetcher is never called. On miss, exactly one fetcher invocation runs to completion; concurrent callers for the same key wait on the advisory lock and read the freshly-written row when the lock is released.

force: true skips the read path entirely — the eager-sync worker (Phase 4) uses it to refresh a still-fresh row on schedule.

Why open a transaction on every call

metric_cache is RLS-isolated; the runtime connects as mcp_app (NOBYPASSRLS). Every read or write needs app.current_tenant_id set first:


SELECT set_config('app.current_tenant_id', $1, true)

The true makes it transaction-local — pooled connections never leak tenant context across requests. Wrapping the whole readOrFetch in a transaction costs one extra BEGIN/COMMIT on cache hits (~sub-ms locally); the alternative was to open the tx only on miss, but then the hit-path read would happen outside any tenant context and RLS would hide every row.

Thundering-herd protection

Inside the transaction, after the optional first read, the service acquires pg_advisory_xact_lock(hashCacheKey(key)). The hash is a 64-bit FNV-1a of ${tenantId}|${platform}|${reportType}|${dateRangeKey}, reinterpreted into the signed int8 range Postgres expects. Concurrent callers serialise on this lock; only one runs the fetcher. After acquiring the lock, the service re-reads the row (double-checked locking) — if a peer already populated it, the current call exits as a cache hit.

Advisory locks auto-release at COMMIT / ROLLBACK, so they can’t leak.

TTL configuration

ttl-config.ts declares per-platform, per-report-type TTLs. The Google entries cover all seven Phase 2 reports; Meta and TikTok have a single placeholder that fills in during Phase 3.


export const TTL_SECONDS = {
  google: {
    account_health:     3600,
    search_term_waste:  7200,
    quality_score:      7200,
    auction_insights:   14_400,
    pmax_breakdown:     7200,
    budget_optimizer:   3600,
    weekly_anomaly:     7200,
  },
  meta:   { account_health: 3600 },
  tiktok: { account_health: 7200 },
} as const;

ttlFor(platform, reportType) throws on an unknown combination — cheap fail-fast against typos.

Metrics

metrics.ts keeps a process-local Map<"platform/reportType", { hit, miss }> populated by recordCacheEvent on every hit/miss. snapshotCacheMetrics() derives hitRate = hit / (hit + miss) per key.

Phase 2 PR-7 exposes this through GET /admin/metrics/cache. Phase 5 replaces the Map with a Prometheus / OTel exporter — until then, the counters are local to the process and reset on restart.

Required schema

The cache upsert needs a unique index on (tenant_id, platform, report_type, date_range_key). Migration 0001_modern_sharon_carter.sql adds metric_cache_key_idx. If you change the cache key shape, regenerate.

Tests

tests/integration/cache.test.ts covers:

Miss → hit progression within TTL (fetcher invoked once).
Expired row triggers re-fetch and bumps fetched_at.
Thundering herd: 4 concurrent misses produce exactly 1 fetcher call; 1 miss + 3 hits.
Hit/miss counters appear in snapshotCacheMetrics() with the expected ratio.
force: true re-runs the fetcher and overwrites the row even when fresh.

Cross-references

database.md — metric_cache RLS + unique index.
mcp-tools.md — the seven tools (PR-6) drive readOrFetch.