Pulp Engine — Operator Runbook

Operator reference. Steps in order. Run everything from the repo root unless noted.

Pre-deployment checklist

Postgres mode (STORAGE_MODE=postgres or unset):

DATABASE_URL set in .env and reachable — psql "$DATABASE_URL" -c "\conninfo"

SQL Server mode (STORAGE_MODE=sqlserver):

SQL_SERVER_URL set in .env and reachable

File mode (STORAGE_MODE=file):

TEMPLATES_DIR set in .env and the directory contains valid template JSON files

First-boot hardening failures (expected, not a bug)

When HARDEN_PRODUCTION is on — auto-derived from NODE_ENV=production, which the Docker image sets — the API refuses to start until all seven production security controls are explicitly configured. It calls hardeningViolations() at startup and exit(1)s with every unmet control listed at once. The controls:

CORS_ALLOWED_ORIGINS — explicit origins (wildcard * is rejected)
DOCS_ENABLED — set explicitly (false recommended)
METRICS_TOKEN — bearer auth for GET /metrics
REQUIRE_HTTPS=true
TRUST_PROXY=true (required when REQUIRE_HTTPS=true)
BLOCK_REMOTE_RESOURCES=true (restrict outbound render fetches)
EDITOR_USERS_JSON or ALLOW_SHARED_KEY_EDITOR=true (editor identity)

A first boot that exits with this list is the hardened posture working as designed — set the listed controls (see .env.example) and restart. To run unhardened for evaluation, set HARDEN_PRODUCTION=false.

S3 asset binary storage pre-flight

Complete this checklist before starting the API with ASSET_BINARY_STORE=s3.

Bucket exists and is in the correct region.
Credentials (S3_ACCESS_KEY_ID / S3_SECRET_ACCESS_KEY) have object-write/delete access (s3:PutObject, s3:DeleteObject) and the bucket-level access required for the HeadBucket startup probe (see deployment-guide.md § Object Storage).
Access mode (ASSET_ACCESS_MODE):
- Public mode (default): bucket and objects must be publicly readable at S3_PUBLIC_URL. Puppeteer fetches asset URLs without auth headers. S3_PUBLIC_URL required when using a custom endpoint or path-style.
- Private mode: bucket does not need public-read. API credentials must have s3:GetObject in addition to s3:PutObject, s3:DeleteObject. S3_PUBLIC_URL not required.
S3_PUBLIC_URL set (public mode only) when using a custom endpoint (S3_ENDPOINT) or path-style (S3_PATH_STYLE=true). Verify the URL is reachable from Puppeteer’s perspective (same network as the API container).
CORS configured on the bucket if the editor (browser) loads images directly from S3_PUBLIC_URL (origin GET). Not required if images are only fetched server-side by Puppeteer.
Verify bucket access from the deployment host:

# Quick connectivity probe (requires AWS CLI or equivalent)
aws s3 ls s3://$S3_BUCKET --region $S3_REGION

API startup log shows: "Asset binary store: S3" with the correct bucket and region. GET /health/ready returns 200 with all checks "ok".

Env vars required (S3 mode):

Variable	Example
`ASSET_BINARY_STORE`	`s3`
`S3_BUCKET`	`my-pulp-engine-assets`
`S3_REGION`	`us-east-1`
`S3_ACCESS_KEY_ID`	`AKIA...`
`S3_SECRET_ACCESS_KEY`	(secret)
`S3_ENDPOINT`	`https://minio.example.com` (custom providers only)
`S3_PATH_STYLE`	`true` (MinIO only)
`S3_PUBLIC_URL`	`https://assets.example.com` (required with custom endpoint or path-style)

Deployment steps

Run these in order. Each step must succeed before continuing.

Postgres mode (default)

# 1. Install dependencies
pnpm install

# 2. Generate Prisma client
pnpm db:generate

# 3. Apply all migrations to the database
pnpm --filter @pulp-engine/api db:deploy
# Already-applied migrations are skipped; safe to re-run

# 4. Load sample templates
pnpm db:seed
# Expected output: "loan-approval-letter@1.0.0 seeded" and "sample-invoice@1.0.0 seeded"

# 5. Build all packages
pnpm build

# 6. Start the API
node apps/api/dist/index.js
# Expected: JSON log line with "Pulp Engine API running on http://..."

File mode

# 1. Install dependencies
pnpm install

# 2. Generate Prisma client (compiles types only; no DB connection made)
pnpm db:generate

# 3. Build all packages
pnpm build

# 4. Start the API
node apps/api/dist/index.js
# Expected: JSON log line with "Pulp Engine API running on http://..."

No migration or seed step required — the API reads templates directly from TEMPLATES_DIR on startup.

SQL Server mode

# 1. Install dependencies
pnpm install

# 2. Generate Prisma client (compiles types only; no DB connection made)
pnpm db:generate

# 3. Apply SQL Server schema
pnpm --filter @pulp-engine/api db:migrate:sqlserver
# Creates the database if absent; idempotent — safe to re-run

# 4. Load sample templates
pnpm db:seed
# Expected output: "loan-approval-letter@1.0.0 seeded" and "sample-invoice@1.0.0 seeded"

# 5. Build all packages
pnpm build

# 6. Start the API
node apps/api/dist/index.js
# Expected: JSON log line with "Pulp Engine API running on http://..."

# PM2
pm2 start apps/api/dist/index.js --name pulp-engine-api
pm2 save

Migrating from file mode to a database backend

One-time migration when promoting a deployment from STORAGE_MODE=file to postgres or SQL Server.

Stop the API first. Migrate into an empty target database.

Existing records in the target are skipped, not updated. For a clean migration, use an empty database.

# 1. Apply the target schema (postgres example)
pnpm --filter @pulp-engine/api db:deploy
# SQL Server: pnpm --filter @pulp-engine/api db:migrate:sqlserver

# 2. Dry run — verify the startup lines show the correct paths and storage mode
STORAGE_MODE=postgres \
  TEMPLATES_DIR=/var/pulp-engine/templates \
  ASSETS_DIR=/var/pulp-engine/assets \
  DATABASE_URL="$DATABASE_URL" \
  pnpm --filter @pulp-engine/api db:migrate:file-to-db -- --dry-run

# 3. Run the migration
STORAGE_MODE=postgres \
  TEMPLATES_DIR=/var/pulp-engine/templates \
  ASSETS_DIR=/var/pulp-engine/assets \
  DATABASE_URL="$DATABASE_URL" \
  pnpm --filter @pulp-engine/api db:migrate:file-to-db
# Exit 0 = success; Exit 2 = partial (review warnings); Exit 1 = fatal

# 4. Update .env: set STORAGE_MODE=postgres; restart the API

Asset binaries are not moved by the script — ensure ASSETS_DIR is the same path in the target deployment, or copy binary files there first.

See deployment-guide.md §10 for full details, source-data error policy, and known limitations.

Smoke tests after deployment

Run the validation script immediately after the process starts:

# Runs liveness, readiness, metrics, auth, and (optionally) render checks
./scripts/validate-deploy.sh http://localhost:3000 $API_KEY_ADMIN loan-approval-letter

# Without a template key (skips render check — useful for fresh deployments pre-seed)
./scripts/validate-deploy.sh http://localhost:3000 $API_KEY_ADMIN

# Docker image deployment — also verify the bundled editor SPA is being served
EXPECT_EDITOR=true ./scripts/validate-deploy.sh http://localhost:3000 $API_KEY_ADMIN

The script exits 0 on success and 1 on any failure. Run it as part of your deployment pipeline or CI gate.

Bundled editor check (Docker image deployments)

When deploying the Docker image, verify the full editor path end-to-end:

# 1. Verify the editor SPA is served
curl -I http://localhost:3000/editor/
# Expected: HTTP/1.1 200 OK, Content-Type: text/html

# 2. Verify /editor redirects to /editor/
curl -I http://localhost:3000/editor
# Expected: HTTP/1.1 301 Moved Permanently, Location: /editor/

Then verify the editor can reach the API in a browser:

Open http://[host]:3000/editor/ — the login screen should load (not an error or blank page)
Enter API_KEY_EDITOR — the editor should load and /templates should be reachable
If PREVIEW_ROUTES_ENABLED=true is set: open a template and click the preview button — it should render

Or use the validate script with EXPECT_EDITOR=true (checks 1–2 above automatically):

EXPECT_EDITOR=true ./scripts/validate-deploy.sh http://localhost:3000 $API_KEY_ADMIN

For live preview to work: PREVIEW_ROUTES_ENABLED=true must be set. The evaluator compose files set this automatically. See deployment-guide.md § Visual Editor for production guidance.

Detailed manual checks follow for diagnosis and additional coverage:

1. Health checks

# Liveness
curl -s http://localhost:3000/health
# Expected: { "status": "ok", "version": "0.51.0", "timestamp": "2026-..." }

# Readiness (verifies storage, asset binary store, and renderer are reachable)
curl -s http://localhost:3000/health/ready
# Expected: { "status": "ok", "version": "0.51.0", "timestamp": "2026-...", "checks": { "storage": "ok", "assetBinaryStore": "ok", "renderer": "ok" } }

A 503 from /health/ready means at least one subsystem check returned "error" or "timeout": storage — check template store connectivity (database or file system); assetBinaryStore — check binary asset store (file system or S3); renderer — check Chromium browser process or render dispatcher. Any single failing check causes 503. In API-only mode (no render dispatcher, preview disabled), the renderer check always reports "ok".

# Metrics scrape (Prometheus format)
curl -s http://localhost:3000/metrics | head -20
# Expected: lines starting with # HELP and process_cpu_seconds_total

2. List templates

curl -s http://localhost:3000/templates \
  -H "X-Api-Key: $API_KEY_ADMIN"

Expected: a paginated envelope { "items": [...], "total": N, "limit": 50, "offset": 0 } with items containing at least two entries — loan-approval-letter and sample-invoice. If items is empty: postgres or sqlserver mode → re-run pnpm db:seed; file mode → verify TEMPLATES_DIR is set correctly and contains valid JSON files.

3. HTML render (fast — no Puppeteer)

curl -s -X POST http://localhost:3000/render/html \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY_RENDER" \
  -d '{
    "template": "loan-approval-letter",
    "data": {
      "applicantName": "Smoke Test",
      "loanAmount": 10000,
      "interestRate": 5.0,
      "termMonths": 12,
      "requiresGuarantor": false,
      "items": []
    }
  }' | head -c 100

Expected: starts with <!DOCTYPE html>. Any 4xx or 5xx — check logs.

3b. CSV export (no Puppeteer)

curl -s -X POST http://localhost:3000/render/csv \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY_RENDER" \
  -d '{
    "template": "loan-approval-letter",
    "data": {
      "applicantName": "Smoke Test",
      "loanAmount": 10000,
      "interestRate": 5.0,
      "termMonths": 12,
      "requiresGuarantor": false,
      "items": [{ "description": "Test", "amount": 100 }]
    }
  }' | head -c 200

Expected: CSV header row + data rows. 422 with no_rendered_tables means the template has no table nodes. Any 5xx — check logs.

4. Asset management

curl -s http://localhost:3000/assets \
  -H "X-Api-Key: $API_KEY_ADMIN"

Expected: a paginated envelope { "items": [...], "total": 0, "limit": 50, "offset": 0 } (empty items array is fine on a fresh deployment — no assets have been uploaded yet). A 4xx or 5xx response indicates a routing or startup problem.

5. PDF render (end-to-end)

curl -s -X POST http://localhost:3000/render/pdf \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY_RENDER" \
  -d '{
    "template": "loan-approval-letter",
    "data": {
      "applicantName": "Smoke Test",
      "loanAmount": 10000,
      "interestRate": 5.0,
      "termMonths": 12,
      "requiresGuarantor": false,
      "items": []
    }
  }' --output /tmp/smoke.pdf && head -c 4 /tmp/smoke.pdf

Expected output: %PDF. This also warms up the Puppeteer browser singleton (first call takes ~2–3 s; subsequent calls are faster).

6. Confirm preview route gating (production only)

The key distinction: disabled preview routes return 404; enabled preview routes are registered and respond to the request (returning a validation error for invalid input, not 404).

If PREVIEW_ROUTES_ENABLED is not set (default — routes are disabled):

curl -s -o /dev/null -w "%{http_code}" -X POST \
  http://localhost:3000/render/preview/html \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY_ADMIN" \
  -d '{"template":{},"data":{}}'
# Expected: 404

If PREVIEW_ROUTES_ENABLED=true (routes are enabled): the route is registered — an invalid body triggers template validation rather than returning 404.

STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
  http://localhost:3000/render/preview/html \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY_ADMIN" \
  -d '{"template":{},"data":{}}')
[ "$STATUS" != "404" ] && echo "OK — route registered ($STATUS)" || echo "FAIL — route not found"

Verify production logging

curl -s http://localhost:3000/health > /dev/null

Check the process output (or log file if piped). You should see a JSON line like:

{"level":30,"time":1234567890,"reqId":"req-1","req":{"method":"GET","url":"/health"},"res":{"statusCode":200},"responseTime":3.2,"msg":"request completed"}

Key fields to confirm are present: level, time, reqId, res.statusCode, responseTime.

If you see pretty-printed output instead of JSON — confirm NODE_ENV=production is set in .env and the process was restarted after the change.

If you see no log output at all for requests — confirm level is info in the server config (this was fixed in the pre-deployment pass; rebuild if on an older artifact).

Audit log events (v0.20.0+)

Three structured log event types are emitted for operator accountability. All include actor (operator-supplied actor label or null) and credentialScope.

`editor_token_minted`

Emitted on every successful POST /auth/editor-token.

Field	Value
`event`	`editor_token_minted`
`keyScope`	Scope of the key used to mint the token (`admin` or `editor`)
`issuedAt`	ISO-8601 timestamp
`expiresAt`	ISO-8601 timestamp
`actor`	Operator-supplied actor label, or `null` if none was supplied

`template_mutation`

Emitted on every successful template write: POST /templates (create), PUT /templates/:key (update), DELETE /templates/:key (delete), POST /templates/:key/versions/:version/restore.

Field	Value
`event`	`template_mutation`
`operation`	`create`, `update`, `delete`, or `restore`
`templateKey`	The template key
`credentialScope`	`admin` or `editor`
`actor`	Operator-supplied actor label, or `null`

`asset_mutation`

Emitted on every successful asset write: POST /assets/upload and DELETE /assets/:id.

Field	Value
`event`	`asset_mutation`
`operation`	`upload` or `delete`
`assetId`	The asset UUID
`credentialScope`	`admin` or `editor`
`actor`	Operator-supplied actor label, or `null`

actor: null means the write was performed via direct X-Api-Key auth, or no actor label was supplied at login. Raw API key values and token strings are never included in log payloads.

Queryable audit endpoint

In addition to structured logs, all three event types are persisted to the database and queryable via GET /audit-events (admin scope required). See the API guide for filter parameters and response format.

# Example: all mutations by a specific actor in the last 7 days
curl -s "http://localhost:3000/audit-events?actor=alice&since=$(date -u -d '-7 days' +%Y-%m-%dT%H:%M:%SZ)" \
  -H "X-Api-Key: $API_KEY_ADMIN"

Audit events are stored in the same database as templates and assets.

Purging old events: Use DELETE /audit-events?before=<ISO 8601> (admin scope) to remove events older than a given timestamp. The endpoint returns { "deleted": N }.

# Example: purge events older than 90 days
CUTOFF=$(date -u -d '-90 days' +%Y-%m-%dT%H:%M:%SZ)
curl -s -X DELETE "http://localhost:3000/audit-events?before=$CUTOFF" \
  -H "X-Api-Key: $API_KEY_ADMIN"

For automated retention, schedule a cron job or Kubernetes CronJob that calls this endpoint periodically (e.g., nightly with a 90-day cutoff). A common convention is to set AUDIT_RETENTION_DAYS=90 in your deployment’s environment as a reminder to the operator script of the chosen retention window.

Request correlation with X-Request-ID (v0.54.0+)

Every API response includes an X-Request-ID header containing a server-generated UUID. The same value appears as reqId in all structured log entries for that request.

Correlating a client error with server logs:

# 1. Extract the request ID from the response header
curl -s -D - http://localhost:3000/templates \
  -H "X-Api-Key: $API_KEY_ADMIN" 2>&1 | grep -i x-request-id
# X-Request-ID: 3bcc2c16-228b-4b09-8181-347201942b11

# 2. Search structured logs for that request
cat logs/api.json | jq 'select(.reqId == "3bcc2c16-228b-4b09-8181-347201942b11")'

The request ID is always server-generated and cannot be overridden by clients. Reverse proxies should forward (not strip) the X-Request-ID response header to downstream clients.

Verify templates

# List all templates (admin or editor key)
curl -s http://localhost:3000/templates \
  -H "X-Api-Key: $API_KEY_ADMIN" | jq '.items[].key'
# Expected: "loan-approval-letter", "sample-invoice"

# Get a sample payload (admin or editor key)
curl -s http://localhost:3000/templates/loan-approval-letter/sample \
  -H "X-Api-Key: $API_KEY_ADMIN"

# Validate a payload without rendering (editor or admin key)
curl -s -X POST http://localhost:3000/templates/loan-approval-letter/validate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY_ADMIN" \
  -d "$(curl -s -H "X-Api-Key: $API_KEY_ADMIN" http://localhost:3000/templates/loan-approval-letter/sample)"
# Expected: { "valid": true, "issues": [] }

What to check if the API fails to start with HARDEN_PRODUCTION=true

When HARDEN_PRODUCTION=true, the API exits immediately with a combined error listing all violations. Example:

❌ HARDEN_PRODUCTION=true but required security controls are not configured:
   • CORS_ALLOWED_ORIGINS must be set to a comma-separated list of specific trusted origins ...
   • DOCS_ENABLED must be explicitly set. Use DOCS_ENABLED=false to disable the Swagger UI ...
   • METRICS_TOKEN must be set to protect GET /metrics with bearer authentication ...
Configure all required controls or unset HARDEN_PRODUCTION to disable enforcement.

Resolution — configure each listed control:

Control	What to set
`CORS_ALLOWED_ORIGINS` violation	Set to comma-separated specific origins, e.g. `CORS_ALLOWED_ORIGINS=https://editor.example.com`. Wildcard `*` is not accepted in hardened mode.
`DOCS_ENABLED` violation	Explicitly set `DOCS_ENABLED=false` (recommended) or `DOCS_ENABLED=true` to acknowledge exposure. Leaving it unset (defaulting) is rejected.
`METRICS_TOKEN` violation	Generate and set a token: `METRICS_TOKEN=$(openssl rand -hex 32)`. Pass the same token to `validate-deploy.sh` as the 4th argument.
`REQUIRE_HTTPS` violation	Set `REQUIRE_HTTPS=true`. Also set `TRUST_PROXY=true` (see below).
`TRUST_PROXY` violation	Set `TRUST_PROXY=true`. Required when `REQUIRE_HTTPS=true` so Fastify can read `X-Forwarded-Proto` behind a TLS-terminating reverse proxy. Safe for direct-TLS deployments too.
`BLOCK_REMOTE_RESOURCES` violation	Set `BLOCK_REMOTE_RESOURCES=true` to prevent the render pipeline from fetching resources from arbitrary public hosts during PDF generation. Optionally set `ALLOWED_REMOTE_ORIGINS` for trusted font/image CDNs.
Named-user registry violation	When editor login is capable (any of `API_KEY_EDITOR`, `API_KEY_ADMIN`, or `API_KEY` set): configure `EDITOR_USERS_JSON` for per-user identity (recommended), or set `ALLOW_SHARED_KEY_EDITOR=true` to explicitly accept shared-key identity.
`OIDC_EDITOR_GROUPS` violation (OIDC enabled)	When OIDC is enabled (`OIDC_DISCOVERY_URL` + `OIDC_CLIENT_ID` set), `OIDC_EDITOR_GROUPS` must be set explicitly: a comma-separated group list to restrict editor access (recommended), an empty string to allow editor only via `OIDC_ADMIN_GROUPS`, or `*` to accept every authenticated SSO user as an editor. (`OIDC_REDIRECT_URI` must also use `https://`.)

All seven controls — plus the conditional OIDC controls when OIDC is enabled — must be in place before restarting the API with HARDEN_PRODUCTION=true.

Symptom	Check
Login form appears even with correct key	Confirm `API_KEY_EDITOR` is set on the API server; restart the API after changing it
”Editor login is not configured” config-error card	`API_KEY_EDITOR` (or `API_KEY_ADMIN`) is not set — only render/preview keys are present; set `API_KEY_EDITOR`
”Invalid key” error after entering the correct value	Key was entered with leading/trailing whitespace, or `API_KEY_EDITOR` was changed since the editor was last used; re-enter the correct value
Login succeeds but editor shows `401` immediately	The server-side `API_KEY_EDITOR` was rotated after the token was issued; all outstanding tokens are invalidated — log in again with the new key
Login gate blocks after `API_KEY_EDITOR` rotation	Expected — token was signed with the old key. Log in again with the new `API_KEY_EDITOR` value.
Session expires mid-session	Token TTL is controlled by `EDITOR_TOKEN_TTL_MINUTES` (default 8 hours). After expiry the editor automatically returns to the login form. Re-enter the key to continue.
Session token appears valid but `401` despite no rotation	`EDITOR_TOKEN_ISSUED_AFTER` may be set to a time after the token was minted. Tokens with an issued-at before this threshold are rejected even if not expired.

Verify the auth endpoints are reachable

# Should return 200 with authRequired and editorLoginAvailable fields
curl -s http://localhost:3000/auth/status
# Expected: {"authRequired":true,"editorLoginAvailable":true}

# Should return 200 with a token (replace <key> with API_KEY_EDITOR value)
curl -s -X POST http://localhost:3000/auth/editor-token \
  -H "Content-Type: application/json" \
  -d '{"key":"<key>"}'
# Expected: {"token":"...","expiresAt":"..."}

If authRequired is true but editorLoginAvailable is false: only render/preview keys are configured; set API_KEY_EDITOR or API_KEY_ADMIN and restart the API.

HTTPS reminder: POST /auth/editor-token transmits API_KEY_EDITOR over the network. In production, ensure the API is served behind HTTPS (TLS-terminating reverse proxy). On plain HTTP, a network observer can capture the key at login time.

Invalidate outstanding editor sessions without key rotation (v0.19.0+)

If you suspect a session token has been compromised but do not want to rotate API_KEY_EDITOR (which would also disrupt other integrations using that key directly), use the issued-after guard:

Note the current UTC time: date -u +"%Y-%m-%dT%H:%M:%SZ"
Set EDITOR_TOKEN_ISSUED_AFTER=<timestamp> in your environment (e.g. 2026-03-24T14:00:00Z).
Restart the API.

All editor tokens with an issued-at timestamp before the configured value will be rejected with 401. Users will see the login form and must mint a fresh token. Tokens issued after the threshold are unaffected.

Requirements and caveats:

The value must be a UTC ISO-8601 datetime string with an explicit offset (e.g. Z or +00:00). An invalid format causes the API to exit at startup.
The guard takes effect only after a restart — it is pre-computed from config, not re-read per request.
Pre-v0.19.0 tokens have no issued-at claim and are treated as iat=0 — they are always rejected when this guard is set. This is intentional: if you are running a mixed deployment, outstanding old-format tokens will be invalidated on upgrade when the guard is active.
In multi-instance deployments, server clocks must be reasonably synchronised (NTP); a large clock skew between instances means the guard may fire at slightly different times across nodes.
To disable the guard, unset EDITOR_TOKEN_ISSUED_AFTER and restart.

Auth secret rotation

How auth secrets work

All auth secrets are loaded once from environment variables at process startup. There is no hot reload — every rotation or guard change requires a process restart. The key structures are:

credentials map — built from API_KEY_ADMIN, API_KEY_EDITOR, API_KEY_RENDER, API_KEY_PREVIEW. Controls which keys are accepted as X-Api-Key and which keys can mint editor session tokens via POST /auth/editor-token.
editorCapableSecrets array — built from the admin and editor keys (plus any configured previous keys). Used by verifyEditorToken to validate X-Editor-Token headers. Supports multiple candidate secrets simultaneously, which enables the rollover window described in Procedures D and E below.
notBefore guard — pre-computed from EDITOR_TOKEN_ISSUED_AFTER. Rejects tokens with an issued-at timestamp before the configured value. See the “Invalidate outstanding editor sessions without key rotation” section above for the standalone procedure.

Clock synchronisation: In multi-instance deployments, server clocks must be NTP-synchronised. The notBefore guard and session token expiry checks are time-based; large clock skew between instances means the guard fires at inconsistent times across the fleet.

Procedure A — Invalidate sessions only (no key change)

If you need to force all active editor sessions to log in again without rotating API_KEY_EDITOR, use the EDITOR_TOKEN_ISSUED_AFTER guard described in the section above (“Invalidate outstanding editor sessions without key rotation”).

Procedure B — Rotate `API_KEY_RENDER` or `API_KEY_PREVIEW`

These keys have no session tokens. Near-zero-downtime rotation is not supported for them because there is no previous-key mechanism for render/preview keys.

Option 1 — Coordinated cutover (recommended):

Generate new secret: openssl rand -base64 32
Update API_KEY_RENDER (or API_KEY_PREVIEW) on all instances simultaneously.
Restart all instances together.
Callers must switch to the new key.

No mixed-auth window; brief downtime during restart.

Option 2 — Rolling rollout (accept temporary inconsistency):

Update and restart instances one at a time.
Restarted instances accept only the new key; pending instances accept only the old key.
During the rollout, callers see intermittent 401 responses regardless of which key they use, because different instances disagree.

Use Option 2 only if temporary auth inconsistency is acceptable for these endpoints.

Procedure C — Rotate `API_KEY_EDITOR` with brief downtime

Use when session downtime during the restart window is acceptable (no active editor sessions, or coordinated with users).

Single instance:

Generate new secret: openssl rand -base64 32
Update API_KEY_EDITOR in .env.
Restart. All tokens signed with the old key immediately fail HMAC verification. Users see 401 and must log in again with the new key.
Validate: ./scripts/validate-deploy.sh

Multi-instance — coordinated restart:

Update API_KEY_EDITOR on all instances simultaneously.
Restart all instances. During the brief window when different instances hold different keys, tokens minted against old-key instances fail on new-key instances and vice versa.
Once all instances are restarted with the new key, all pre-rotation sessions are invalidated.
Keep the restart window short; perform during low-traffic periods.

Procedure D — Rotate `API_KEY_EDITOR` near-zero-downtime

Use API_KEY_EDITOR_PREVIOUS to allow tokens signed with the old key to continue verifying through the rollover window. This preserves existing editor sessions — it does not preserve direct X-Api-Key usage of the old key. Callers using the old key directly must coordinate a switch to the new key during the rollout.

Single instance:

Generate new secret: openssl rand -base64 32
Set API_KEY_EDITOR=<new> and API_KEY_EDITOR_PREVIOUS=<old> in .env.
Restart. The startup log will emit a rollover warning — this is expected.
Behaviour: new tokens are minted with the new key. Existing tokens signed with the old key continue to verify via API_KEY_EDITOR_PREVIOUS.
Wait for the rollover window to close. Maximum wait is EDITOR_TOKEN_TTL_MINUTES (default 8 hours) after the restart.
Remove API_KEY_EDITOR_PREVIOUS from .env and restart again.
Validate: ./scripts/validate-deploy.sh

Multi-instance — rolling rotation (the primary use case):

Generate new secret.
For each instance in turn:
- Set API_KEY_EDITOR=<new> and API_KEY_EDITOR_PREVIOUS=<old> on that instance.
- Restart that instance.
- Validate with ./scripts/validate-deploy.sh against that instance before continuing.
After all instances are updated: every instance accepts both old-key tokens (via previous) and new-key tokens. Rolling restarts are safe for editor sessions.
Wait for EDITOR_TOKEN_TTL_MINUTES to elapse since the first instance was restarted.
Remove API_KEY_EDITOR_PREVIOUS from all instances and perform a second rolling restart.
Final validation: ./scripts/validate-deploy.sh

Key invariants:

API_KEY_EDITOR_PREVIOUS must not equal API_KEY_EDITOR, API_KEY_ADMIN, API_KEY_RENDER, or API_KEY_PREVIEW. The server rejects the combination at startup.
The previous key cannot be submitted to POST /auth/editor-token (returns 401).
The previous key cannot be used as X-Api-Key (returns 401).
The rollover window is at most EDITOR_TOKEN_TTL_MINUTES. After that window, all old-key tokens have naturally expired and API_KEY_EDITOR_PREVIOUS can be removed.
Do not leave the previous key set indefinitely — it represents a second verifiable secret.

Procedure E — Rotate `API_KEY_ADMIN`

Identical to Procedure D but using API_KEY_ADMIN and API_KEY_ADMIN_PREVIOUS. Note that API_KEY_ADMIN can mint editor session tokens (in addition to its full admin scope), so the same TTL window reasoning applies.

Near-zero-downtime applies to existing editor sessions signed with the old admin key. It does not preserve direct X-Api-Key usage of the old API_KEY_ADMIN — callers using the admin key directly as X-Api-Key must coordinate a switch to the new key.

Caveats and checklist

Restart is always required. No env var change takes effect without restarting the process.
Verify the rollover warning in the startup log. When API_KEY_EDITOR_PREVIOUS or API_KEY_ADMIN_PREVIOUS is set, the server emits a warn-level log at startup confirming rollover mode is active. If you do not see this log, the previous key was not loaded.
Previous key TTL window. The safe time to remove API_KEY_EDITOR_PREVIOUS is ≥ EDITOR_TOKEN_TTL_MINUTES after the first instance was restarted with the new key. Removing it earlier may invalidate sessions on instances that have not yet restarted.
Do not leave previous keys in place indefinitely. Remove them once the rollover window is closed.
NTP sync required for EDITOR_TOKEN_ISSUED_AFTER. If you use the issued-after guard in a multi-instance deployment, server clocks must be NTP-synchronised. Large skew means the guard fires at inconsistent times across the fleet.
Multi-instance deployments require identical env at steady state. All instances must have the same active keys once the rollout is complete.

What to check if PDF rendering fails

1. Check the request log for the error.

Look for a log line where res.statusCode is 500 and follow the reqId. An err field will be present on the same or adjacent line:

{"level":50,"reqId":"req-4","err":{"message":"...","stack":"..."},"msg":"..."}

2. Common failure causes:

Symptom	Check
`err.message` contains `Could not find Chromium`	Puppeteer install incomplete — re-run `pnpm install`
`err.message` contains `error while loading shared libraries`	Missing Linux system libraries — see deployment-guide.md §1
`err.message` contains `Navigation timeout`	Puppeteer `setContent` timed out (30 s limit) — template HTML may be too large or contain blocking resources
`err.message` contains `Target closed`	Browser singleton crashed — restart the API process; browser will re-launch on next request
`err.message` contains `Cannot find template` or `404`	Postgres or SQL Server mode: template not seeded — re-run `pnpm db:seed` or create/import templates via the API. File mode: verify `TEMPLATES_DIR` path and that the target JSON file is valid.
High memory usage before failure	PDF buffered in memory; large document — reduce template complexity or increase server RAM
PDF requests queue and don’t respond immediately under load	Expected behaviour — the concurrency limiter allows at most `RENDER_MAX_CONCURRENT_PAGES` (default 5) simultaneous Chrome pages. Requests beyond that wait in FIFO order up to `RENDER_MAX_QUEUE_DEPTH` waiters, and are shed with `503 render_saturated` + `Retry-After` beyond that (v0.85.0) — clients should back off and retry. Sustained 503s with the `pulp_engine_render_dispatches_in_flight` gauge pegged at the cap → raise the pool within host memory budget or add replicas; see deployment-guide § Sizing the render pool.
Batch requests (`POST /render/batch`) are slow	Each batch processes up to `BATCH_CONCURRENCY` items in parallel (default: 5). A 50-item batch runs 10 sequential waves. Reduce `BATCH_MAX_ITEMS` or increase `BATCH_CONCURRENCY` (but keep it ≤ `MAX_CONCURRENT_PAGES` to avoid starving single renders). Monitor `pulp_engine_renders_total{type="batch-pdf"}` for throughput.
Server log shows `ERR_STREAM_PREMATURE_CLOSE` at info level	Normal — a client disconnected mid-stream. The error handler suppresses the cascade. No action needed; the page slot is released automatically.

3. Isolate with an HTML render first.

If POST /render/html succeeds but POST /render/pdf fails, the problem is in Puppeteer, not in the template or data pipeline.

Asset upload validation

Asset uploads are validated server-side at the store layer before any binary is written.

Accepted formats: PNG, JPEG, GIF, WebP. All other types — including SVG — are rejected.

Two-stage validation:

Allowlist check — the declared MIME type must be one of the four accepted types. SVG (image/svg+xml) is rejected with an explicit error message citing script-injection risk.
Magic-bytes check — the file’s actual content is inspected (first 4–12 bytes) and compared against the declared type. If they do not match, the upload is rejected even if the MIME type would otherwise be allowed.

HTTP 415 Unsupported Media Type is returned for all of these failure cases:

Case	Example
Declared type not in allowlist	`image/bmp`, `application/javascript`
SVG declared	`image/svg+xml`
Content does not match declared type	JPEG file submitted with `Content-Type: image/png`
File content unrecognized	Renamed script or binary with an image MIME type
File too short to detect	Fewer than 4 bytes

MIME normalization: The declared MIME type is normalized (trimmed, lowercased, parameters stripped) before validation. image/PNG; charset=binary is treated as image/png. The normalized value is what is stored in metadata.

Existing SVG assets (residual risk): SVG assets uploaded before v0.27.0 are not automatically migrated or removed. They continue to be served by all four serve paths:

Private-mode proxy (GET /assets/:filename): content type derived from file extension — existing .svg files served as image/svg+xml.
Private-mode inline rendering: MIME type derived from file extension for base64 data URIs — existing .svg files inlined as image/svg+xml data URIs.
Public-mode filesystem: @fastify/static serves by extension — unchanged.
Public-mode S3: files stored in S3 with the ContentType set at upload time — unchanged.

The API server logs a legacy_svg_detected warning at startup if assets matching either detection signal are present (declared mimeType: image/svg+xml or filename ending in .svg). This warning repeats on every restart until the assets are removed.

Remediation workflow:

Enumerate legacy SVG candidates (admin credentials required):
```
GET /assets?legacySvg=true
```
Returns all assets matched by declared mimeType image/svg+xml or by .svg filename extension. This covers both correctly declared SVGs and extension-only mismatches from the pre-v0.27.0 MIME-trust era.
Identify template references for each returned asset. Check template definitions or run a test render — templates that reference the SVG asset will break if it is deleted before replacement.
Upload a raster replacement: POST /assets/upload with a PNG or WebP version of the image.
Update template definitions to reference the new raster asset URL instead of the SVG.
Delete the legacy SVG: DELETE /assets/:id (admin credentials required). Only do this after templates have been updated.
Confirm remediation: restart the server — the legacy_svg_detected warning will not appear once all matching assets are removed.

Note: Assets whose SVG content was mislabeled at upload time (e.g. stored as image/png with a non-.svg filename) are not detectable without binary inspection of every stored file. The workflow above covers the common case of correctly declared and extension-identified legacy SVGs.

Metrics-based alert definitions

The following PromQL expressions are recommended as starting-point alert rules. Adjust thresholds to match your traffic volume.

Render failure rate

# Alert if > 10% of PDF renders in the last 5 minutes are failures
rate(pulp_engine_render_requests_total{type="pdf",status="failure"}[5m])
  /
rate(pulp_engine_render_requests_total{type="pdf"}[5m])
  > 0.10

Runbook: Check API logs for reason="render_error" entries. Run the PDF smoke test manually (POST /render/pdf) to confirm — if it passes, the failures may be from a specific bad template. If Puppeteer is failing consistently, restart the API to force a browser re-launch.

Auth failure spike (possible credential scanning)

# Alert if invalid-key failures exceed 20 per minute
rate(pulp_engine_auth_failures_total{reason="invalid_key"}[1m]) * 60 > 20

Runbook: Check access logs for the originating IP. If traffic is from an unexpected source, apply a rate limit or IP block at the reverse proxy.

Storage readiness degraded

# Alert if /health/ready has returned 503 more than once in 2 minutes
changes(up{job="pulp-engine"}[2m]) > 0

Or poll /health/ready directly from your uptime monitor (recommended — simpler than a scrape-based alert for storage checks).

Runbook: Check database / file system availability. For postgres: psql "$DATABASE_URL" -c "\conninfo". For file mode: confirm TEMPLATES_DIR is mounted and readable. Once storage recovers, the readiness probe automatically returns 200.

High P99 PDF render latency

# Alert if P99 PDF render latency exceeds 25 seconds
histogram_quantile(0.99,
  rate(pulp_engine_http_request_duration_seconds_bucket{route="render_pdf"}[5m])
) > 25

Runbook: PDF render time depends on template complexity and Puppeteer browser health. Check for hung Chrome processes. If the API is under sustained load (more concurrent render requests than RENDER_MAX_CONCURRENT_PAGES), queue back-pressure is expected — and beyond RENDER_MAX_QUEUE_DEPTH waiters the API sheds with 503 render_saturated rather than queueing (v0.85.0). The alert may be a false positive during traffic spikes.

Elevated version conflicts

# Alert if optimistic-concurrency conflicts exceed 5 per minute
rate(pulp_engine_template_mutations_total{status="conflict"}[1m]) * 60 > 5

Runbook: Multiple concurrent editor sessions updating the same template. This is expected at low rates. At elevated rates it may indicate a runaway automation loop or a UI bug. Check which template is causing conflicts via API logs.

Dead-letter queues

Two persistent DLQs exist. Both require admin scope, SCHEDULE_ENABLED=true, and Postgres storage.

`/admin/schedule-dlq` — failed scheduled deliveries

Schedule executions fail into this queue when every retry for a delivery target exhausts backoff.

# List pending entries (paginated, filter by status/scheduleId)
curl -s "http://localhost:3000/admin/schedule-dlq?status=pending&limit=50" \
  -H "X-Api-Key: $API_KEY_ADMIN" | jq .

# Replay — rehydrates the CURRENT schedule config (operator fixes picked up)
curl -X POST "http://localhost:3000/admin/schedule-dlq/<id>/replay" \
  -H "X-Api-Key: $API_KEY_ADMIN"

# Abandon — mark terminal without delivering
curl -X POST "http://localhost:3000/admin/schedule-dlq/<id>/abandon" \
  -H "X-Api-Key: $API_KEY_ADMIN"

Replay can refuse with one of these 409 Conflict codes: schedule_gone (underlying schedule deleted — entry marked orphaned), schedule_mutated (delivery target changed or removed — orphaned), render_artifact_expired (rendered artefact purged — abandon and retrigger via POST /schedules/:id/trigger), dispatcher_unavailable (SCHEDULE_ENABLED=false), already_terminal (already replayed/abandoned/orphaned). Secrets are never echoed — the DLQ stores references only.

`/admin/batch-dlq` — failed async batch webhooks

Same shape (GET /, GET /:id, POST /:id/replay) for callback webhooks from POST /render/batch/async jobs.

Multi-tenant operations

Routes below require super-admin (scope=admin with tenantId=null — tenant-bound admins are rejected 403 super_admin_only). Active only when MULTI_TENANT_ENABLED=true and STORAGE_MODE=postgres; otherwise every route returns 503 unavailable.

# Create a tenant (slug is immutable)
curl -X POST http://localhost:3000/admin/tenants \
  -H "X-Api-Key: $API_KEY_SUPER_ADMIN" \
  -H "Content-Type: application/json" \
  -d '{"id":"acme-corp","name":"ACME Corporation"}'

# Soft-archive (blocks writes; reads continue for audit export)
curl -X POST http://localhost:3000/admin/tenants/acme-corp/archive \
  -H "X-Api-Key: $API_KEY_SUPER_ADMIN"

# Restore
curl -X POST http://localhost:3000/admin/tenants/acme-corp/unarchive \
  -H "X-Api-Key: $API_KEY_SUPER_ADMIN"

Archive vs delete. Archive is the supported tenant-offboarding primitive. DELETE /admin/tenants/:id returns 501 Not Implemented until a cascade policy lands for audit events, versions, DLQ history, and scheduled deliveries.

Cache TTL. Archive/unarchive bust the handling pod’s TenantStatusCache immediately; other pods catch up within TENANT_STATUS_CACHE_TTL_MS (default 10 s). Writes to an archived tenant return 409 tenant_archived.

Runtime named-user management

Named-user mode is enabled when either EDITOR_USERS_JSON or EDITOR_USERS_FILE is set. Runtime CRUD via /admin/users mutates the in-memory registry; with EDITOR_USERS_FILE configured, mutations persist to disk via atomic write. Without it, changes are lost on restart.

# List users (key redacted to last 4 chars as keyHint)
curl -s http://localhost:3000/admin/users -H "X-Api-Key: $API_KEY_ADMIN" | jq .

# Add a user
curl -X POST http://localhost:3000/admin/users \
  -H "X-Api-Key: $API_KEY_ADMIN" \
  -H "Content-Type: application/json" \
  -d '{"id":"alice","displayName":"Alice","key":"<strong-random>","role":"editor"}'

# Revoke all sessions for one user (effective immediately on next auth check)
curl -X PUT http://localhost:3000/admin/users/alice \
  -H "X-Api-Key: $API_KEY_ADMIN" \
  -H "Content-Type: application/json" \
  -d "{\"tokenIssuedAfter\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}"

# Delete (existing tokens remain valid until expiry — no active revocation
# list; set tokenIssuedAfter first if you need immediate cutoff)
curl -X DELETE http://localhost:3000/admin/users/alice \
  -H "X-Api-Key: $API_KEY_ADMIN"

# Re-read EDITOR_USERS_FILE after editing externally; OIDC-auto-provisioned
# users are merged in, not clobbered
curl -X POST http://localhost:3000/admin/users/reload \
  -H "X-Api-Key: $API_KEY_ADMIN"

POST /admin/users/reload returns 404 if EDITOR_USERS_FILE does not exist on disk. When DELETE drops the registry to zero, the response carries X-PulpEngine-Warning — editor login will be unavailable until at least one user exists.

Rollback

Rollback is straightforward when running the Docker image — the image tag is the artifact version. Each released version maps 1-to-1 with a git tag (e.g., v0.PREV.Y).

Docker rollback (recommended)

# 1. Stop and remove the current container
docker stop pulp-engine && docker rm pulp-engine

# 2. Start the previous image tag (no build required)
docker run -d --name pulp-engine \
  [same -p, -e, and -v flags as the original deployment] \
  ghcr.io/OWNER/pulp-engine:v0.PREV.Y

# 3. Validate the rollback
./scripts/validate-deploy.sh http://localhost:3000 $API_KEY_ADMIN

The previous image is already in the registry — docker run pulls it if it’s not cached locally.

Postgres schema rollback: The previous image’s migrations are already applied — no schema rollback is needed unless you ran forward-only schema changes that break the old code. If that is the case, restore from a database backup taken before the migration ran, then start the rollback image.

Database unavailability (postgres): The API refuses to start if Prisma cannot connect at boot. Restore the PostgreSQL instance and restart the container — no application changes needed.

SQL Server unavailability: Same pattern — restore the SQL Server instance, restart the container.

File mode: If TEMPLATES_DIR is unreachable, the API will fail to start. Verify the volume mount and restart.

Bare-metal rollback (non-Docker deployments)

# 1. Stop the running process
pm2 stop pulp-engine-api
# or: kill -SIGTERM <pid>   (graceful shutdown — waits for in-flight requests)

# 2. Check out the previous release tag and rebuild
git checkout v0.PREV.Y
pnpm install
pnpm db:generate
pnpm build

# 3. Restart
pm2 start pulp-engine-api

# 4. Validate
./scripts/validate-deploy.sh http://localhost:3000 $API_KEY_ADMIN

Upgrading from v0.22.0 to v0.23.0

No breaking changes to the API surface. The upgrade adds two nullable columns to the database schema and two new optional environment variables.

Postgres

A new Prisma migration (add_created_by) is included in v0.23.0. Run as part of the normal upgrade procedure:

pnpm --filter @pulp-engine/api db:deploy

This adds created_by to template_versions and assets. The columns are nullable — safe to apply on a live database with no downtime.

SQL Server

Run the migration runner before starting v0.23.0:

pnpm --filter @pulp-engine/api db:migrate:sqlserver

This applies 002_add_created_by.sql, which adds the same nullable created_by columns. The runner handles both fresh installs and upgrades from v0.22.0 automatically — no manual DDL required.

File mode

No schema changes needed. createdBy is carried in the in-memory asset index; existing records report createdBy: null.

New optional environment variables

Variable	Default	Notes
`ASSET_ACCESS_MODE`	`public`	Absent → public mode (unchanged behaviour)
`EDITOR_USERS_JSON`	(absent)	Absent → shared-key mode (unchanged behaviour)

No existing environment variables were renamed or removed.

Upgrading to v0.82.0 — OIDC `id_token` Bearer deprecation (SEC-A1)

Breaking change. A raw OIDC id_token is no longer accepted as an Authorization: Bearer API credential. The per-request OIDC provider was removed from the auth chain because a raw id_token Bearer previously authenticated every protected route, bypassing the 5-minute freshness + Origin checks that POST /auth/oidc/exchange enforces. The login / callback / complete / exchange / logout flows are unchanged — only the “send the id_token straight to a protected API route” shortcut is gone.

Who is affected: only integrations that put a provider id_token directly in the Authorization: Bearer header of API calls (/render/*, /templates/*, …). Standard SSO login through the editor, scoped API_KEY_* callers, and editor session tokens are unaffected.

What operators/clients will see

A protected request that used to succeed with Authorization: Bearer <id_token> now fails the auth check and returns 401 Unauthorized (the canonical error envelope: { error, code, message, requestId }). The id_token is simply an unrecognised credential after the upgrade.
This shows up as a post-upgrade spike in 401s on otherwise-unchanged clients.

Detect old-style integrations before/after upgrading

In the structured (pino JSON) request logs, look for a rise in statusCode: 401 on protected routes immediately after the upgrade, and correlate offenders by reqId / X-Request-ID.
A JWT-shaped Bearer (three dot-separated base64url segments, eyJ…) on an API route — as opposed to a scoped API_KEY_* or a 4-part editor token — is the tell-tale of a raw-id_token caller.
Staged check: point a canary client at /health/ready (unauthenticated) to confirm reachability, then at one real protected route with its existing Bearer; a 401 there confirms it needs migration.

Migrate

Exchange the id_token for a short-lived editor token and call the API with that:

# 1. Exchange the freshly-minted id_token for an editor session token.
#    Body field is `oidcToken`. The id_token must be fresh (~5 min) and, when
#    CORS_ALLOWED_ORIGINS is configured, the Origin header must be allow-listed.
curl -sX POST https://<host>/auth/oidc/exchange \
  -H 'Content-Type: application/json' \
  -H 'Origin: https://<your-app-origin>' \
  -d '{"oidcToken":"<id_token>"}'
# → { "token": "<short-lived editor token>", ... }

# 2. Call protected routes with the returned editor token
curl -s https://<host>/templates -H 'Authorization: Bearer <editor token>'

Machine-to-machine callers should switch to a scoped API_KEY_* credential instead of an id_token. See oidc-guide.md for the exchange flow and token lifetimes.

← Back to docs index

Pulp Engine — Operator Runbook

Pre-deployment checklist

First-boot hardening failures (expected, not a bug)

S3 asset binary storage pre-flight

Deployment steps

Postgres mode (default)

File mode

SQL Server mode

Migrating from file mode to a database backend

Smoke tests after deployment

Bundled editor check (Docker image deployments)

1. Health checks

2. List templates

3. HTML render (fast — no Puppeteer)

3b. CSV export (no Puppeteer)

4. Asset management

5. PDF render (end-to-end)

6. Confirm preview route gating (production only)

Verify production logging

Audit log events (v0.20.0+)

editor_token_minted

template_mutation

asset_mutation

Queryable audit endpoint

Request correlation with X-Request-ID (v0.54.0+)

Verify templates

What to check if the API fails to start with HARDEN_PRODUCTION=true

What to check if the editor login fails

Login gate always shows / token invalid

Verify the auth endpoints are reachable

Invalidate outstanding editor sessions without key rotation (v0.19.0+)

Auth secret rotation

How auth secrets work

Procedure A — Invalidate sessions only (no key change)

Procedure B — Rotate API_KEY_RENDER or API_KEY_PREVIEW

Procedure C — Rotate API_KEY_EDITOR with brief downtime

Procedure D — Rotate API_KEY_EDITOR near-zero-downtime

Procedure E — Rotate API_KEY_ADMIN

Caveats and checklist

What to check if PDF rendering fails

Asset upload validation

Metrics-based alert definitions

Render failure rate

Auth failure spike (possible credential scanning)

Storage readiness degraded

High P99 PDF render latency

Elevated version conflicts

Dead-letter queues

/admin/schedule-dlq — failed scheduled deliveries

/admin/batch-dlq — failed async batch webhooks

Multi-tenant operations

Runtime named-user management

Rollback

Docker rollback (recommended)

Bare-metal rollback (non-Docker deployments)

Upgrading from v0.22.0 to v0.23.0

Postgres

SQL Server

File mode

New optional environment variables

Upgrading to v0.82.0 — OIDC id_token Bearer deprecation (SEC-A1)

What operators/clients will see

Detect old-style integrations before/after upgrading

Migrate

`editor_token_minted`

`template_mutation`

`asset_mutation`

Procedure B — Rotate `API_KEY_RENDER` or `API_KEY_PREVIEW`

Procedure C — Rotate `API_KEY_EDITOR` with brief downtime

Procedure D — Rotate `API_KEY_EDITOR` near-zero-downtime

Procedure E — Rotate `API_KEY_ADMIN`

`/admin/schedule-dlq` — failed scheduled deliveries

`/admin/batch-dlq` — failed async batch webhooks

Upgrading to v0.82.0 — OIDC `id_token` Bearer deprecation (SEC-A1)