Release v0.17.0

Pulp Engine v0.17.0 — Release Notes

Baseline Observability Layer

Summary

Prior to this release, Pulp Engine had no application-level metrics and almost no targeted structured logging. Auth failures, render failures, version conflicts, and preview-vs-production traffic were invisible to operators beyond raw HTTP status codes. Monitoring required relying entirely on reverse-proxy access logs.

v0.17.0 adds a complete baseline observability layer:

Prometheus metrics via prom-client — five counters/histograms covering HTTP traffic, render outcomes, template mutations, and auth failures
Targeted structured logging — warn-level logs with bounded fields for previously-silent failure modes
Split health endpoints — liveness (/health) and readiness (/health/ready with storage ping)
LOG_LEVEL config — adjustable log verbosity without rebuilding
Updated operator documentation — deployment guide, runbook (with alert definitions), and API guide

New endpoints

`GET /health/ready` — readiness probe

Verifies that the configured storage backend (postgres, SQL Server, or file) is reachable within 2 seconds.

200 OK — storage reachable:

{
  "status": "ok",
  "timestamp": "2026-03-23T10:00:00.000Z",
  "checks": { "storage": "ok" }
}

503 Service Unavailable — storage unreachable or timed out:

{
  "status": "degraded",
  "timestamp": "2026-03-23T10:00:00.000Z",
  "checks": { "storage": "error" }
}

GET /health is unchanged — it remains a pure liveness probe with no dependency checks.

`GET /metrics` — Prometheus metrics

Returns current application and process metrics in Prometheus text format 0.0.4. No authentication required; rate limiting disabled.

Restrict access at the network layer in production (reverse proxy IP allow-list or firewall).

Metrics exposed

Metric	Type	Labels
`pulp_engine_http_requests_total`	Counter	`method`, `route`, `status_class`
`pulp_engine_http_request_duration_seconds`	Histogram	`method`, `route`
`pulp_engine_render_requests_total`	Counter	`type` (pdf/html), `source` (production/preview), `status` (success/failure)
`pulp_engine_template_mutations_total`	Counter	`operation` (create/update/delete/restore), `status` (success/conflict/not_found/duplicate/failure)
`pulp_engine_auth_failures_total`	Counter	`reason` (missing_key/invalid_key/insufficient_scope/invalid_token)

Default prom-client process metrics are also exported (CPU, memory, GC, event-loop lag).

Infra routes (/health, /health/ready, /metrics) are excluded from HTTP request counters and the duration histogram — scrape and probe traffic does not inflate application metrics.

Label cardinality discipline: route is normalised to a bounded set (≤ 12 values). Template keys, raw URLs, IP addresses, and free-form error text never appear in any metric label or structured log field.

Structured logging additions

Previously-silent failure modes now emit warn-level log entries with bounded operational fields. All fields are one of a fixed vocabulary — no template key, raw error message, or user-provided string is logged.

Failure	Log fields
Auth — missing key	`{ reason: "missing_key" }`
Auth — invalid key	`{ reason: "invalid_key" }`
Auth — insufficient scope	`{ reason: "insufficient_scope" }`
Auth — invalid editor token	`{ reason: "invalid_token" }`
Render — template not found	`{ source, type, reason: "template_not_found", outcome: "failure" }`
Render — render error	`{ source, type, reason: "render_error", outcome: "failure" }`
Template — version conflict (412)	`{ reason: "version_conflict" }`
Template — duplicate key (409)	`{ reason: "duplicate_key" }`
Template/asset — not found (404)	`{ reason: "not_found" }`

New environment variable

Variable	Default	Description
`LOG_LEVEL`	`info`	Pino log level: `trace`, `debug`, `info`, `warn`, `error`. Adjust without rebuilding.

Upgrade notes

This release adds the prom-client dependency. Run pnpm install before starting the upgraded API.

No database schema changes. No API breaking changes.

/health behaviour is unchanged — existing liveness probes continue to work without modification. Add /health/ready as a separate readiness probe to benefit from storage health detection.

What is not in scope (planned follow-up)

Distributed tracing (OpenTelemetry) — cross-service request correlation
Puppeteer pool state metrics (pulp_engine_puppeteer_active_pages gauge)
Auth success audit log — currently only failures are logged
Slow-query detection — DB/file I/O latency is not directly measured

← Back to releases