Pulp Engine v0.17.0 — Release Notes
Baseline Observability Layer
Summary
Prior to this release, Pulp Engine had no application-level metrics and almost no targeted structured logging. Auth failures, render failures, version conflicts, and preview-vs-production traffic were invisible to operators beyond raw HTTP status codes. Monitoring required relying entirely on reverse-proxy access logs.
v0.17.0 adds a complete baseline observability layer:
- Prometheus metrics via
prom-client— five counters/histograms covering HTTP traffic, render outcomes, template mutations, and auth failures - Targeted structured logging —
warn-level logs with bounded fields for previously-silent failure modes - Split health endpoints — liveness (
/health) and readiness (/health/readywith storage ping) LOG_LEVELconfig — adjustable log verbosity without rebuilding- Updated operator documentation — deployment guide, runbook (with alert definitions), and API guide
New endpoints
GET /health/ready — readiness probe
Verifies that the configured storage backend (postgres, SQL Server, or file) is reachable within 2 seconds.
200 OK — storage reachable:
{
"status": "ok",
"timestamp": "2026-03-23T10:00:00.000Z",
"checks": { "storage": "ok" }
}
503 Service Unavailable — storage unreachable or timed out:
{
"status": "degraded",
"timestamp": "2026-03-23T10:00:00.000Z",
"checks": { "storage": "error" }
}
GET /health is unchanged — it remains a pure liveness probe with no dependency checks.
GET /metrics — Prometheus metrics
Returns current application and process metrics in Prometheus text format 0.0.4. No authentication required; rate limiting disabled.
Restrict access at the network layer in production (reverse proxy IP allow-list or firewall).
Metrics exposed
| Metric | Type | Labels |
|---|---|---|
pulp_engine_http_requests_total | Counter | method, route, status_class |
pulp_engine_http_request_duration_seconds | Histogram | method, route |
pulp_engine_render_requests_total | Counter | type (pdf/html), source (production/preview), status (success/failure) |
pulp_engine_template_mutations_total | Counter | operation (create/update/delete/restore), status (success/conflict/not_found/duplicate/failure) |
pulp_engine_auth_failures_total | Counter | reason (missing_key/invalid_key/insufficient_scope/invalid_token) |
Default prom-client process metrics are also exported (CPU, memory, GC, event-loop lag).
Infra routes (/health, /health/ready, /metrics) are excluded from HTTP request counters and the duration histogram — scrape and probe traffic does not inflate application metrics.
Label cardinality discipline: route is normalised to a bounded set (≤ 12 values). Template keys, raw URLs, IP addresses, and free-form error text never appear in any metric label or structured log field.
Structured logging additions
Previously-silent failure modes now emit warn-level log entries with bounded operational fields. All fields are one of a fixed vocabulary — no template key, raw error message, or user-provided string is logged.
| Failure | Log fields |
|---|---|
| Auth — missing key | { reason: "missing_key" } |
| Auth — invalid key | { reason: "invalid_key" } |
| Auth — insufficient scope | { reason: "insufficient_scope" } |
| Auth — invalid editor token | { reason: "invalid_token" } |
| Render — template not found | { source, type, reason: "template_not_found", outcome: "failure" } |
| Render — render error | { source, type, reason: "render_error", outcome: "failure" } |
| Template — version conflict (412) | { reason: "version_conflict" } |
| Template — duplicate key (409) | { reason: "duplicate_key" } |
| Template/asset — not found (404) | { reason: "not_found" } |
New environment variable
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL | info | Pino log level: trace, debug, info, warn, error. Adjust without rebuilding. |
Upgrade notes
This release adds the prom-client dependency. Run pnpm install before starting the upgraded API.
No database schema changes. No API breaking changes.
/health behaviour is unchanged — existing liveness probes continue to work without modification. Add /health/ready as a separate readiness probe to benefit from storage health detection.
What is not in scope (planned follow-up)
- Distributed tracing (OpenTelemetry) — cross-service request correlation
- Puppeteer pool state metrics (
pulp_engine_puppeteer_active_pagesgauge) - Auth success audit log — currently only failures are logged
- Slow-query detection — DB/file I/O latency is not directly measured