Pulp Engine Document Rendering
Get started
Release v0.78.0

Release v0.78.0 — format-aware publish validation, HA drivers, error-contract normalization

Date: 2026-05-07 Tag: v0.78.0

Summary

Three coordinated upgrades that move Pulp Engine from “commercially credible self-hosted software” toward “broadly production-grade”, per the post-v0.77.1 audit follow-ups:

  1. Format-aware /render/validate. The publish-gate validate route now optionally exercises the real per-format renderer for HTML/PPTX/CSV/XLSX/DOCX, attributes issues by format, and reports any disabled formats explicitly via skippedFormats[] so editors stop showing “All checks passed” for formats that were never actually checked. The legacy default-shape (no formats array) is preserved byte-for-byte.
  2. Two new HA nightly drivers. Check 6 (API key rotation, modeling the API_KEY_EDITOR_PREVIOUS verify-only-for-existing-tokens contract end-to-end across both replicas) and Check 7 (outage failover under sustained load, asserting ≥99% availability while each replica is docker compose stop/start’ed in turn) are now automated under .github/workflows/ha-nightly.yml. The HA reference architecture’s “manual” markers move to “automated nightly” for those two items; rolling-replacement (--force-recreate semantics), Redis/rate-limit, and multi-instance asset/S3 stay in the documented “Still deferred” batch.
  3. Error-contract normalization. Every HTTP JSON error envelope now carries a stable machine-readable code (closed-enum literal union from a 50-entry registry) and a requestId (UUID matching the X-Request-ID response header). All four envelope shapes (standard, validation, rate-limit, render) declare both fields as required at the schema level; SDKs surface them on the thrown error. Existing wire shapes are unchanged — this is purely additive on code/requestId plus a closed-enum tightening.

Also rolls in three release-tooling fixes that landed on main after v0.77.1 (TS SDK Trusted Publishing actually attempted before token fallback; manual SDK reruns now build the requested tag’s SHA; trusted-publisher setup docs use the real TroyCoderBoy/pulpengine repo slug).

What landed

/render/validate format-aware (section 2.1)

  • New optional formats?: ("html"|"pdf"|"docx"|"pptx"|"csv"|"xlsx")[] on POST /render/validate. Default (omitted) = today’s HTML-only behavior, byte-for-byte response shape unchanged.
  • When supplied, the route runs structural checks once and then loops the requested formats through the real renderer with typed-error catch; issues are tagged with format and severity.
  • New top-level skippedFormats: string[] (only present when formats was supplied) lists formats that were requested but disabled via server capability gating. The publish-gate UI now surfaces these as “not validated”, not as success.
  • PDF policy locked: requesting pdf returns valid: true with an info-severity issue code: "PDF_VALIDATED_VIA_HTML" — no Chromium boot during validation. The editor’s “ready” copy enumerates only the formats that actually ran a renderer; PDF appears on a separate transitive-coverage line.
  • Editor publish-gate dialog updated to derive its requested-formats set from server capabilities, group failure issues by format (with structural issues at the top), and render the new skipped-formats line below “ready” copy.
  • Companion test surface: 11+ new cases on apps/api/src/__tests__/render-validate.test.ts and the PublishGateDialog.test.tsx extension covering grouped UI + skipped-formats copy.

HA Checks 6 & 7 automated (section 2.2)

  • scripts/ha/check-6-api-key-rotation.mjs — four-stage rotation lifecycle. Boots the 2-replica HA compose, mints an editor session token under the current key, then drives API_KEY_EDITOR_PREVIOUS rollover end-to-end across both replicas via docker compose --no-deps recreate. Asserts the asymmetry the rotation contract documents: the previously-minted session token stays valid through the rolling key change, the old raw key cannot mint new tokens or authenticate fresh requests, and removing _PREVIOUS ends the grace window. Probes hit each replica directly via docker compose exec wget, with a restart-window log scan matching both "statusCode":401 and "msg":"Auth failure" patterns.
  • scripts/ha/check-7-outage-failover.mjs — sustained-load outage harness. Holds steady request load against the LB and asserts ≥99% availability while each replica is docker compose stop/start’ed in turn. The driver explicitly scopes coverage to outage retry on stable container identities — --force-recreate rolling-replacement semantics are out of scope and stay in the documented “Still deferred” subsection.
  • .github/workflows/ha-nightly.yml adds two parallel jobs mirroring the existing ha-check-2 job structure (image build with shared GHA cache, ephemeral secrets, stack boot, per-replica readiness probe, log upload, teardown).
  • docs/ha-reference-architecture.md Validation Checklist now marks each item as automated vs manual; Item 7 is retitled “outage failover” with a scope-honest non-coverage note; a new “Still deferred” subsection lists rolling-replacement, Redis/rate-limit, and multi-instance asset/S3.
  • New package.json scripts: ha:check-6, ha:check-7.

Error-contract normalization (section 2.3, two commits)

  • Commit A (090cba6) — additive enrichment sweep. New helper module apps/api/src/lib/error-codes.ts (50-entry ERROR_CODES registry, isErrorCode() runtime guard) + apps/api/src/lib/error-envelope.ts (buildErrorEnvelope generic helper + buildRenderErrorEnvelope thin wrapper preserving location/suggestion). Migrated ~258 emitter sites across 28 files (central handler, auth/tenancy plugins, every route file under admin/, audit-events, auth/, nodes/, render/, schedules/, templates/, sandbox/, usage/, assets/). Existing error strings preserved verbatim; only code/requestId are net-new on the wire. Permissive schema declarations on the four shared schemas + the local DLQ + ServiceUnavailable schemas so fast-json-stringify does not strip the new fields. Spec-drift fixes pulled forward where touched (DLQ schemas + ServiceUnavailableSchema declare requestId; templates/index.ts 429 slot switched from ErrorResponseSchema to RateLimitErrorResponseSchema; schedules.routes.ts 9 response blocks backfilled with the 503/409 statuses they actually emit).
  • Commit B (cfc6b17) — schema tightening + closed-enum + SDK pass. New ErrorCodeSchema = Type.Union(ERROR_CODES.map(c => Type.Literal(c))) derived from the registry — single source of truth for OpenAPI, runtime serializer, and SDK codegen. RequestIdFieldSchema lifted from optional to required. code field on ErrorResponseSchema, ValidationErrorResponseSchema, RateLimitErrorResponseSchema, RenderErrorResponseSchema widened from loose Type.Optional(Type.String()) to required ErrorCodeSchema. All four shared error schemas now require requestId. Local ServiceUnavailableSchema and DLQ schemas updated to match. openapi.json regenerated (closed-enum inlined at every error response slot). @pulp-engine/sdk PulpEngineError gains a requestId: string | undefined field; toString() includes code and requestId so support workflows can capture the request id from a stack trace; pulp-engine (Python) PulpEngineError gains request_id: str | None (snake_case Python convention; wire field stays requestId); __repr__ surfaces it.
  • Cross-route contract guards at apps/api/src/__tests__/error-contract.test.ts: every 4xx/5xx response schema in the OpenAPI spec that emits an error envelope must declare code and requestId as required (skipping non-envelope shapes like /health/ready’s health-probe schema); every literal that appears as a code enum value across the spec is in ERROR_CODES. Catches drift in either direction.
  • docs/api-errors.md pins the contract: universal vs conditional fields, all four envelope shapes with worked examples, full registry table, SDK examples in TS and Python, operational guidance for using requestId in support workflows.

Release-tooling fixes (carrying over from [Unreleased])

  • TypeScript SDK publishing actually attempts npm Trusted Publishing before falling back to NPM_TOKEN. The workflow packs workspace-aware tarballs with pnpm pack, publishes those exact bytes with npm publish --provenance, and only retries with NPM_TOKEN if OIDC publish fails.
  • Manual SDK publish reruns build the requested tag commit, not the branch head that launched workflow_dispatch. Both SDK workflows now check out the resolved release SHA in their build jobs.
  • Trusted-publisher setup docs use the real repository slug (TroyCoderBoy/pulpengine). The Python and TypeScript SDK runbooks/comments previously said pulp-engine, which would produce invalid-publisher failures on PyPI or npm if copied into the registry-side trusted-publisher settings.

Pre-tag dependency-advisory bumps

Caught by ci.yml’s pnpm audit --prod --audit-level=moderate gate on the first release-prep SHA (977fccf); patched on the fix-forward commit before tagging.

  • basic-ftp >= 5.3.1 — root pnpm.overrides bumped from >=5.3.0. GHSA-rpmf-866q-6p89 — DoS via unbounded multiline FTP control-response buffering. Transitive via puppeteer > @puppeteer/browsers > proxy-agent > pac-proxy-agent > get-uri > basic-ftp.
  • ip-address >= 10.1.1 — new entry in root pnpm.overrides. GHSA-v2v4-37r5-5v8g — XSS in Address6 HTML-emitting methods. Transitive via puppeteer > @puppeteer/browsers > proxy-agent > socks-proxy-agent > socks > ip-address. Functionally inert here (the renderer never invokes Address6 HTML serialization), but the audit gate is right to flag it.
  • @anthropic-ai/sdk 0.95.0 — direct dep in apps/api/package.json bumped from 0.87.0. GHSA-p7fg-763f-g4gf — insecure default file permissions in the SDK’s Local Filesystem Memory Tool, a feature this codebase does not exercise (AI generation routes use client.messages.create() exclusively). Bump is to clear the audit signal, not to address an exploitable surface.

Operational posture

  • v0.78.0 is fully additive on the wire for existing SDK consumers. Pre-v0.78.0 SDKs still parse v0.78.0 server responses correctly — they ignore the new code/requestId fields. v0.78.0+ SDKs gracefully tolerate pre-v0.78.0 servers — code and requestId surface as undefined/None.
  • Recommended SDK upgrade for support workflows: capture err.requestId (TS) / err.request_id (Python) when filing tickets — it pairs the body 1:1 with the matching server log entry’s reqId and the response’s X-Request-ID header.
  • The publish-gate dialog now reports per-format coverage honestly. Operators upgrading from v0.77.x will see “PPTX was not validated (disabled on this server)” instead of a misleading “All checks passed” when PPTX_ENABLED=false.
  • HA Checks 6 and 7 run on the existing ha-nightly.yml schedule. The total nightly runtime budget is now ~25 min (Check 7’s load step is the longest single step; Checks 2/6 are quicker).

Verified before tagging

Locally verified

  • pnpm --filter @pulp-engine/api typecheck — clean.
  • pnpm --filter @pulp-engine/sdk typecheck — clean.
  • pnpm extract-openapi -- --check — clean (spec matches the source schemas).
  • pnpm --filter @pulp-engine/api test:file1237 passed / 97 skipped / 0 failed on the file-mode suite.
  • pnpm --filter @pulp-engine/sdk test — TypeScript SDK smoke 8/8 passed, including the new requestId enrichment guard against a real server boot.
  • python -m pytest packages/sdk-python/tests/test_errors.py — Python SDK error tests 8/8 passed (including two new request_id cases on build_error()).
  • node scripts/check-version.mjs — green on the prepared release commit (CI opt-in mode for the pre-tag head).

CI-verified

The release-prep SHA was iterated three times before reaching CI green; the SDK registry-publish workflows on the post-tag side are tracked separately under “Known residual” below.

ci.yml on the tagged SHA 2c3d447 (run 25465600701) — all 9 jobs green:

JobStatus
ci (lint + build + typecheck)
test-file-mode
test-sqlserver
test-e2e
test-e2e-auth
Docker build + smoke test
CI — Windows (file-mode + installer smoke)
Windows installer validation
Evaluation bundle validation

Release workflow on v0.78.0 push (run 25467053341) — all 7 jobs green:

JobStatus
CI gate — verify CI passed for release commit
docker (GHCR push)
eval-bundle
windows-installer
scan (Trivy / supply-chain)
release (GitHub Release publish)
Mirror Windows installer to public repo

Release-prep iteration history:

SHACI conclusionReason
977fccffailurepnpm audit --prod --audit-level=moderate flagged three advisories: basic-ftp ≤5.3.0 (GHSA-rpmf-866q-6p89), ip-address ≤10.1.0 (GHSA-v2v4-37r5-5v8g), @anthropic-ai/sdk 0.87.0 (GHSA-p7fg-763f-g4gf). Fixed in 258a3bf.
258a3bffailure (auto-cancelled by next push)Audit gate cleared, but e2e regression in editor-workflows.spec.ts surfaced for the first time — section 2.1’s format-aware preflight had been bundled into the v0.78.0 push and never had its own CI run. Initial diagnosis (tableless fixture causing CSV/XLSX no_rendered_tables) was wrong; attempted fix in c05efce.
c05efcesuccess (flake)Added a stub table to the rich-text fixture on the wrong-diagnosis theory. CI happened to pass on this run — Playwright’s retries: 1 and capability-resolution timing make the underlying e2e race intermittent — but the fix did not address the actual cause, so it was not trusted as the tag target.
2c3d447successStubbed /render/validate at the Playwright layer in both publish-flow e2e tests so the dialog reaches phase === 'ready' deterministically regardless of how format-aware preflight evaluates the rich-text fixture. The fixture-table change from c05efce is harmless and stays in the tagged tree. Tag target.

HA nightly (Checks 2/6/7): intentionally run on the nightly schedule, not on the release-prep SHA. Their first observed-green run covers a wider time window than any individual release; that is the documented signal we wait for, not a release-blocking gate.

Not verified

  • Registry publication (npm, PyPI), GHCR images, GitHub Release assets, public mirror sync, Windows installer smoke, and signed-licence end-to-end smoke remain tag-time/post-tag checks per docs/release-checklist.md.

Known residual

  • pdf-transform malformed-base64 catches are dead. Buffer.from(s, "base64") does not throw on malformed input — the four catch blocks at apps/api/src/routes/render/pdf-transform.ts lines 124, 236, 248, and 347 cannot fire as the routes are written. Malformed payloads currently fall through to PDF parsing and surface as invalid_pdf/unsupported_image 422s rather than the intended generic 400. Behaviorally separate from the error-contract sweep; logged as its own follow-up.
  • Python SDK packaging-name mismatch. The PyPI package name is pulp-engine but the importable module name is docuforge. The pre-v0.78.0 test infra had from pulp-engine import ... (literal hyphen, invalid Python) at the top of every test file, which prevented the Python SDK test suite from loading. Section 2.3’s Commit B fixed only the two files needed to validate the error-contract changes (tests/conftest.py, tests/test_errors.pyfrom docuforge import ...). The broader rename — docuforge/pulp_engine/ matching the PyPI package name with hyphen→underscore convention — is its own follow-up. docs/api-errors.md documents the current from docuforge import ... reality with a forward-looking note.
  • OpenAPI spec inlines the closed ErrorCodeSchema union at every error response slot. Spec growth (~5x) is the cost of inlining; SDK codegen handles it without complaint. Compaction via Type.Ref() into a shared OpenAPI component is a separate refactor logged as a future spec-cleanliness pass — not a contract change.
  • HA scope. Checks 6 and 7 close the rotation + outage-failover slots. Rolling-replacement (--force-recreate semantics), Redis/rate-limit, and multi-instance asset/S3 read-after-write stay in the documented “Still deferred” batch; that’s a separate follow-up batch and was intentional Release 2 scope.
  • Trusted-publisher residual. PyPI Trusted Publishing may still fail loudly if the one-time PyPI trust configuration is incomplete; that is independent of this release. The intentionally-untracked Fly files (fly.toml, .continue/) remain out of scope.