Pulp Engine Document Rendering
Get started

Pulp Engine — HA / Clustering Reference Architecture

Reference architecture for running Pulp Engine across multiple API replicas behind a load balancer. Pairs with deployment-guide.md (single-instance topology) and runbook.md (operational procedures).

This document is enterprise-oriented: it assumes a managed Postgres, an object store (S3 / MinIO / R2), and an HTTPS-terminating load balancer are available. For single-instance or evaluation deployments, use the simpler topologies in the deployment guide.


1. Topology

                      ┌───────────────────────┐
                      │   HTTPS Load Balancer │  (sticky sessions NOT required)
                      │   TLS termination     │
                      └───────────┬───────────┘

                ┌─────────────────┼─────────────────┐
                ▼                 ▼                 ▼
           ┌─────────┐       ┌─────────┐       ┌─────────┐
           │ API pod │       │ API pod │  ...  │ API pod │
           │   N=1   │       │   N=2   │       │   N=k   │
           └────┬────┘       └────┬────┘       └────┬────┘
                │                 │                 │
                ├─────────────────┴─────────────────┤
                ▼                                   ▼
        ┌───────────────┐                 ┌─────────────────┐
        │  Postgres     │                 │  Object store   │
        │  (primary +   │                 │  (S3 / MinIO /  │
        │   replicas)   │                 │   R2 — shared)  │
        └───────────────┘                 └─────────────────┘

Key properties:

  • Request handlers are stateless — no sticky sessions required.
  • Editor session tokens are HMAC-signed (apps/api/src/lib/editor-token.ts) — validated against the shared EDITOR_TOKEN_SECRET, no session store.
  • All durable state lives in Postgres + the object store. Nothing on local pod disk is authoritative.

2. Stateless vs Stateful Components

Stateless (scale horizontally without coordination)

  • HTTP request handlers
  • Editor session tokens (5-part HMAC-signed, no storage)
  • OIDC auth code flow (stateless completion-code delivery)
  • Capability responses
  • Template / asset / render routes

Shared state (authoritative — all pods read/write)

  • Postgres — templates, versions, labels, assets metadata, audit events, schedules + executions + DLQ, tenant registry, render usage. Schema: apps/api/src/prisma/schema.prisma.
  • Object store — asset binaries (ASSET_BINARY_STORE=s3).

Per-pod state (multi-instance safe — see notes below)

ComponentFileMulti-instance behaviour
Schedule dispatcherapps/api/src/lib/schedule-engine.tsEach pod polls independently; DB row-level claim via INSERT … ON CONFLICT … SKIP LOCKED guarantees a given schedule execution fires exactly once across the cluster. No leader election required.
TenantStatusCacheapps/api/src/lib/tenant-status-cache.tsPer-pod TTL cache (default 10 s). Tenant archive operations have a ≤ TTL staleness window before all pods converge. Tune via TENANT_STATUS_CACHE_TTL_MS. Acceptable for typical workloads; set to a lower value if you need stricter archive propagation.
Audit-purge schedulerapps/api/src/lib/audit-purge-scheduler.tsRuns per pod. Idempotent — all pods issue the same DELETE WHERE timestamp < cutoff; duplicate work is harmless but wasteful. Consider disabling on all but one pod in very large deployments (operator choice).
Render-usage-purge schedulerapps/api/src/lib/render-usage-purge-scheduler.tsSame pattern as audit purge — idempotent, safe across pods.
Browser singleton (child-process render mode)apps/api/src/server.tsChromium instance warmed per pod. Cannot be shared cross-process. See § 4.
Delivery dispatcher batch job storeapps/api/src/lib/delivery/dispatcher.tsKnown limitation: in-flight batch jobs held in-memory are lost if the pod restarts mid-batch. The DLQ is persisted to Postgres; permanent failures are not lost. Treat batch deliveries as best-effort across pod restarts.

Not applicable in HA

  • File storage modes (STORAGE_MODE=file, ASSET_BINARY_STORE=filesystem) — assume a single writer. Do not run multiple API pods against a shared filesystem; use Postgres + S3 instead.

3. Required Configuration

All pods must share the following values:

VariableValueNotes
STORAGE_MODEpostgres (or sqlserver)File mode is not HA-safe
DATABASE_URLManaged Postgres primaryPoint replicas at the primary; Prisma does not currently split reads
ASSET_BINARY_STOREs3Required — shared-volume NFS mode also works but S3 is the reference
S3_BUCKET, S3_REGION, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_ENDPOINTShared across podsSee deployment-guide.md § Object Storage
EDITOR_TOKEN_SECRETIdentical across podsHMAC key — token minted by pod A must verify on pod B
API_KEY_ADMIN, API_KEY_EDITOR, API_KEY_RENDER, API_KEY_PREVIEWIdentical across pods
TRUST_PROXYtrueLB terminates TLS; real client IP in X-Forwarded-For
REQUIRE_HTTPStrueEnforce the LB redirect contract
TENANT_STATUS_CACHE_TTL_MS10000 (default) or lowerSee staleness note in § 2
APP_VERSIONSame across podsPrevents mixed-version surprises in /readyz and capability responses

Rollout and rotation

  • API key rotation under HA — use the documented API_KEY_*_PREVIOUS verify-only rollover variables. Set the new key on all pods first, then *_PREVIOUS on all pods, then swap clients over, then remove *_PREVIOUS.
  • Editor token invalidationEDITOR_TOKEN_ISSUED_AFTER is a shared cutover: set it on all pods at the same timestamp and all existing tokens are rejected cluster-wide on the next request.

4. Render Isolation in HA

The rendering layer has three modes (child-process, container, socket). For HA:

RENDER_MODERecommended for HA?Notes
child-process (default)✅ per-podEach pod warms its own Chromium. Safe and simple.
containerEach pod spawns a render container per request. Requires Docker socket; use cautiously (privileged).
socket✅ (most isolated)API pod has no Docker socket; a dedicated controller pod does. Best privilege separation.

Recommendation: start with child-process mode unless you have a specific privilege-separation requirement. Scale the API pods horizontally; render capacity scales with pod count.


5. Known Limitations

  1. Batch delivery jobs are in-memory per pod — pod restart mid-batch loses in-flight job state (DLQ still captures permanent failures).
  2. Audit and render-usage purge schedulers run per pod — harmless duplicate work. If this shows up in DB load metrics, operator may disable on all but one pod via env-var gating (not currently exposed — follow-up).
  3. TenantStatusCache staleness window (default 10 s) — archive-a-tenant propagation is eventually consistent within TTL.
  4. No read replicas — Prisma is configured against a single DATABASE_URL. Under very high read load, scale Postgres vertically or add a read-replica-aware proxy (PgBouncer + per-query routing) in front of the database; the app does not partition reads itself.

6. Reference Compose

A reference docker-compose.ha.yml is provided at the repo root.

This is a demo / evaluation stack, not a production reference. It exists to make the validation exercise in § 7 reproducible on a single host and to show the wiring. For production:

  • Replace MinIO with managed S3.
  • Replace the Postgres container with a managed Postgres service (backups, HA, PITR).
  • Replace the simple LB container with your production ingress (ALB, GCLB, nginx, Traefik, etc.).
  • Store secrets in your platform’s secret manager, not the compose file.

See docker-compose.ha.yml for the stack and docs/ha-validation-report.md for the validation results.


7. Validation Checklist

Manual smoke test — not an automated regression gate. Rerun after major version upgrades or infrastructure changes. Results captured in ha-validation-report.md.

  1. Shared asset readability. Upload an asset via pod A, render a template referencing it via pod B. Expect: same asset bytes returned in the PDF.
  2. Schedule fires exactly once. Configure a cron schedule; start 2+ pods; wait for one tick. Query schedule_executions table — expect exactly one row per scheduled tick (not one per pod).
  3. Editor token cross-pod. Mint an editor token via pod A (POST /editor-token), submit a template mutation via pod B with that token. Expect: 200 + audit row attributed to the minter.
  4. Graceful degradation. Kill one pod mid-request. Expect: other pods continue to serve; LB routes around the failed pod.
  5. Tenant archive propagation. In multi-tenant mode, archive a tenant via pod A. Wait TENANT_STATUS_CACHE_TTL_MS. Expect: write attempts via pod B are rejected.
  6. Key rotation. Follow the API_KEY_*_PREVIOUS runbook. Expect: zero 401s during the rotation window when clients are switched one-by-one.

An automated harness for this checklist is tracked as a follow-up initiative.