Pulp Engine Document Rendering
Get started
Release v0.17.0

Pulp Engine v0.17.0 — Release Notes

Baseline Observability Layer

Summary

Prior to this release, Pulp Engine had no application-level metrics and almost no targeted structured logging. Auth failures, render failures, version conflicts, and preview-vs-production traffic were invisible to operators beyond raw HTTP status codes. Monitoring required relying entirely on reverse-proxy access logs.

v0.17.0 adds a complete baseline observability layer:

  • Prometheus metrics via prom-client — five counters/histograms covering HTTP traffic, render outcomes, template mutations, and auth failures
  • Targeted structured loggingwarn-level logs with bounded fields for previously-silent failure modes
  • Split health endpoints — liveness (/health) and readiness (/health/ready with storage ping)
  • LOG_LEVEL config — adjustable log verbosity without rebuilding
  • Updated operator documentation — deployment guide, runbook (with alert definitions), and API guide

New endpoints

GET /health/ready — readiness probe

Verifies that the configured storage backend (postgres, SQL Server, or file) is reachable within 2 seconds.

200 OK — storage reachable:

{
  "status": "ok",
  "timestamp": "2026-03-23T10:00:00.000Z",
  "checks": { "storage": "ok" }
}

503 Service Unavailable — storage unreachable or timed out:

{
  "status": "degraded",
  "timestamp": "2026-03-23T10:00:00.000Z",
  "checks": { "storage": "error" }
}

GET /health is unchanged — it remains a pure liveness probe with no dependency checks.

GET /metrics — Prometheus metrics

Returns current application and process metrics in Prometheus text format 0.0.4. No authentication required; rate limiting disabled.

Restrict access at the network layer in production (reverse proxy IP allow-list or firewall).


Metrics exposed

MetricTypeLabels
pulp_engine_http_requests_totalCountermethod, route, status_class
pulp_engine_http_request_duration_secondsHistogrammethod, route
pulp_engine_render_requests_totalCountertype (pdf/html), source (production/preview), status (success/failure)
pulp_engine_template_mutations_totalCounteroperation (create/update/delete/restore), status (success/conflict/not_found/duplicate/failure)
pulp_engine_auth_failures_totalCounterreason (missing_key/invalid_key/insufficient_scope/invalid_token)

Default prom-client process metrics are also exported (CPU, memory, GC, event-loop lag).

Infra routes (/health, /health/ready, /metrics) are excluded from HTTP request counters and the duration histogram — scrape and probe traffic does not inflate application metrics.

Label cardinality discipline: route is normalised to a bounded set (≤ 12 values). Template keys, raw URLs, IP addresses, and free-form error text never appear in any metric label or structured log field.


Structured logging additions

Previously-silent failure modes now emit warn-level log entries with bounded operational fields. All fields are one of a fixed vocabulary — no template key, raw error message, or user-provided string is logged.

FailureLog fields
Auth — missing key{ reason: "missing_key" }
Auth — invalid key{ reason: "invalid_key" }
Auth — insufficient scope{ reason: "insufficient_scope" }
Auth — invalid editor token{ reason: "invalid_token" }
Render — template not found{ source, type, reason: "template_not_found", outcome: "failure" }
Render — render error{ source, type, reason: "render_error", outcome: "failure" }
Template — version conflict (412){ reason: "version_conflict" }
Template — duplicate key (409){ reason: "duplicate_key" }
Template/asset — not found (404){ reason: "not_found" }

New environment variable

VariableDefaultDescription
LOG_LEVELinfoPino log level: trace, debug, info, warn, error. Adjust without rebuilding.

Upgrade notes

This release adds the prom-client dependency. Run pnpm install before starting the upgraded API.

No database schema changes. No API breaking changes.

/health behaviour is unchanged — existing liveness probes continue to work without modification. Add /health/ready as a separate readiness probe to benefit from storage health detection.


What is not in scope (planned follow-up)

  • Distributed tracing (OpenTelemetry) — cross-service request correlation
  • Puppeteer pool state metrics (pulp_engine_puppeteer_active_pages gauge)
  • Auth success audit log — currently only failures are logged
  • Slow-query detection — DB/file I/O latency is not directly measured