Pulp Engine Document Rendering
Get started
Release v0.85.0

Release v0.85.0 — Evidence & Resilience (2026-06-11)

Audit remediation release 2 of 2, closing the five high-value findings the 2026-06-10 fresh-pass audit left open after v0.84.x (“operator truth”): render-pool saturation handling (H3), the month-red HA failover check (H4), never-executed render-isolation suites (H5), template format versioning (H7), and the editor publish-gate trust gaps (H8). Shipped across PRs #86–#100. No breaking changes — every contract addition is additive, and all new env vars have backward-compatible defaults.

Highlights

Saturation sheds instead of cascading (H3)

Before: an overloaded render pool queued unboundedly; the child-process dispatcher’s wall-clock deadline then killed workers mid-render, orphaning every sibling render (one slow burst could cascade into total render-path failure). Now:

  • The browser-pool queue is bounded (RENDER_MAX_QUEUE_DEPTH, default 2× pool). Beyond it, renders shed immediately with the new render_saturated code.
  • POST /render/pdf, POST /render/preview/pdf, and the sandbox render return 503 + Retry-After: 30 (body carries retryAfter: 30) — clients get an actionable backpressure signal instead of a slow timeout.
  • Dispatch is two-phase: queue-wait expiry sheds only the waiting render; a render that has started keeps its full execution budget.
  • An API-side admission gate (RENDER_MAX_CONCURRENT_PAGES + RENDER_MAX_QUEUE_DEPTH in flight) sheds before HTML-generation cost is sunk — and bounds concurrent docker run spawns in container mode.

New env vars (see deployment guide § “Sizing the render pool”): RENDER_MAX_CONCURRENT_PAGES (1–50, default 5), RENDER_MAX_QUEUE_DEPTH (0–1000, default 2× pool), RENDER_WORKER_TIMEOUT_MS (5000–600000, default 65000). RENDER_PREVIEW_RESERVED_SLOTS may now go up to 49, with a boot-time rule that it leaves at least one batch slot.

HA Check 7 fixed at the root (H4)

The single-replica outage failover check had failed every nightly since it landed (~95.2 % availability vs the 99 % threshold). Root cause: docker compose stop blackholes new connections (no RST); nginx’s default 60 s proxy_connect_timeout let probes hang until the client aborted, and client aborts never count toward max_fails, so the dead upstream was never benched — a serial probe loop ate ~3 failures per outage window. The LB now fails connects in 1 s, retries the surviving peer on error/timeout/502/503 (bounded: 2 tries, 4 s), and benches explicitly.

A red nightly can no longer rot silently: any failing check creates or comments on a single pinned ha-nightly-failure issue.

Render isolation modes are now proven in CI (H5)

The container- and socket-mode streaming suites (env-gated since v0.46.0) had never executed in CI — RENDER_MODE=container/socket shipped on unit coverage alone. The push-gated docker job now builds the worker image, probes a Chromium launch inside it under the dispatcher’s exact hardening flags, and runs both suites against it: real docker run dispatch, and a real render-controller on a Unix socket.

The gate caught a production bug on its first execution: Chromium’s crashpad handler crashed the entire browser launch inside hardened containers (--read-only + --cap-drop ALL) on common kernels — container/socket isolation was broken on such hosts despite working on Docker Desktop. Fixed by disabling the crash machinery in the launch args and pointing the worker image’s HOME at the tmpfs.

Template format versioning (H7)

Definitions now carry formatVersion: 1, stamped at the save boundary (absence still means 1, so nothing existing changes; an unknown future format is rejected loudly instead of misread). template-compatibility.md documents the additive-only format promise and the pulp validate upgrade pre-flight. Two new permanent gates enforce it: a frozen compat corpus (including a v0.18.0 production-era template, asserted parseable forever) and an editor↔server parity test (every starter pack must build a definition the server’s save-boundary schema accepts).

Editor publish-gate trust (H8)

  • Global shortcuts no longer act on the canvas underneath an open dialog (Delete could remove canvas nodes mid-publish-review), and Escape with a dialog open belongs to the dialog.
  • A stale “All checks passed.” verdict can no longer publish unchecked content: if the template mutated after the verdict, Publish Now re-runs the checks against the mutated template.
  • A new e2e spec exercises the REAL /render/validate (no route stubs): fail-closed failure path with the machine code surfaced, and a full-format success path through to publish.
  • FormApp embed test debt resolved with no silent skips: the ready-message test is un-skipped and green in CI; the submit-path test’s deterministic ubuntu-only failure is tracked in #96.

Evidence

ClaimEvidence
Full CI matrix green on the release line, including both isolation suites executing real rendersrun 27280878065 (commit 6402f9c)
HA nightly green incl. Check 7 — dispatchrun 27278656273: all 4 checks green; Check 7 = 282/282 requests, 100.00 % availability
HA nightly green incl. Check 7 — real scheduled runrun 27286219985: all 4 checks green; Check 7 again 282/282, 100.00 %
Check 7 fix reproduced locally before CI280/280 (100.00 %) against the same stack via pnpm ha:check-7
Saturation contractRoute-level test holds the real pool, asserts 503 + Retry-After: 30 + envelope, then proves recovery with a live Chromium render
Crashpad fixCI probe CHROMIUM_PDF_OK + container suite 3/3 in CI (previously instant launch failure with chrome_crashpad_handler: --database is required)

Upgrade notes

  • No action required. All new env vars default to the previous effective behaviour (pool of 5, bounded queue of 10, 65 s watchdog).
  • Operators who want immediate shedding under load can set RENDER_MAX_QUEUE_DEPTH=0; clients should treat 503 + Retry-After on render routes as retryable backpressure.
  • RENDER_MODE=container/socket operators should pull the v0.85.0 worker image — older images can fail Chromium launch on kernels where crashpad misbehaves under the hardened flags (see Fixed above).
  • Stored templates are untouched; formatVersion: 1 appears on definitions the next time they are saved. Restores remain byte-faithful clones.

Deferred / known items

  • npm/PyPI SDK publishing remains deferred (backlog PCR-2); the two publish workflows are disabled until registry credentials exist, so tag pushes no longer produce guaranteed-red runs.
  • FormApp submit-path embed test: deterministic ubuntu-only failure under jsdom, tracked in #96 (runtime behaviour unaffected; header wiring is covered by the un-skipped sibling test).
  • HA Check 6 rolling-replacement variant and the multi-instance S3 read-after-write check remain deferred (see ha-validation-report.md).