Release v0.78.0 — format-aware publish validation, HA drivers, error-contract normalization
Date: 2026-05-07
Tag: v0.78.0
Summary
Three coordinated upgrades that move Pulp Engine from “commercially credible self-hosted software” toward “broadly production-grade”, per the post-v0.77.1 audit follow-ups:
- Format-aware
/render/validate. The publish-gate validate route now optionally exercises the real per-format renderer for HTML/PPTX/CSV/XLSX/DOCX, attributes issues byformat, and reports any disabled formats explicitly viaskippedFormats[]so editors stop showing “All checks passed” for formats that were never actually checked. The legacy default-shape (noformatsarray) is preserved byte-for-byte. - Two new HA nightly drivers.
Check 6(API key rotation, modeling theAPI_KEY_EDITOR_PREVIOUSverify-only-for-existing-tokens contract end-to-end across both replicas) andCheck 7(outage failover under sustained load, asserting ≥99% availability while each replica isdocker compose stop/start’ed in turn) are now automated under.github/workflows/ha-nightly.yml. The HA reference architecture’s “manual” markers move to “automated nightly” for those two items; rolling-replacement (--force-recreatesemantics), Redis/rate-limit, and multi-instance asset/S3 stay in the documented “Still deferred” batch. - Error-contract normalization. Every HTTP JSON error envelope now carries a stable machine-readable
code(closed-enum literal union from a 50-entry registry) and arequestId(UUID matching theX-Request-IDresponse header). All four envelope shapes (standard, validation, rate-limit, render) declare both fields as required at the schema level; SDKs surface them on the thrown error. Existing wire shapes are unchanged — this is purely additive oncode/requestIdplus a closed-enum tightening.
Also rolls in three release-tooling fixes that landed on main after v0.77.1 (TS SDK Trusted Publishing actually attempted before token fallback; manual SDK reruns now build the requested tag’s SHA; trusted-publisher setup docs use the real TroyCoderBoy/pulpengine repo slug).
What landed
/render/validate format-aware (section 2.1)
- New optional
formats?: ("html"|"pdf"|"docx"|"pptx"|"csv"|"xlsx")[]onPOST /render/validate. Default (omitted) = today’s HTML-only behavior, byte-for-byte response shape unchanged. - When supplied, the route runs structural checks once and then loops the requested formats through the real renderer with typed-error catch; issues are tagged with
formatandseverity. - New top-level
skippedFormats: string[](only present whenformatswas supplied) lists formats that were requested but disabled via server capability gating. The publish-gate UI now surfaces these as “not validated”, not as success. - PDF policy locked: requesting
pdfreturnsvalid: truewith an info-severity issuecode: "PDF_VALIDATED_VIA_HTML"— no Chromium boot during validation. The editor’s “ready” copy enumerates only the formats that actually ran a renderer; PDF appears on a separate transitive-coverage line. - Editor publish-gate dialog updated to derive its requested-formats set from server capabilities, group failure issues by
format(with structural issues at the top), and render the new skipped-formats line below “ready” copy. - Companion test surface: 11+ new cases on
apps/api/src/__tests__/render-validate.test.tsand thePublishGateDialog.test.tsxextension covering grouped UI + skipped-formats copy.
HA Checks 6 & 7 automated (section 2.2)
scripts/ha/check-6-api-key-rotation.mjs— four-stage rotation lifecycle. Boots the 2-replica HA compose, mints an editor session token under the current key, then drivesAPI_KEY_EDITOR_PREVIOUSrollover end-to-end across both replicas viadocker compose --no-deps recreate. Asserts the asymmetry the rotation contract documents: the previously-minted session token stays valid through the rolling key change, the old raw key cannot mint new tokens or authenticate fresh requests, and removing_PREVIOUSends the grace window. Probes hit each replica directly viadocker compose exec wget, with a restart-window log scan matching both"statusCode":401and"msg":"Auth failure"patterns.scripts/ha/check-7-outage-failover.mjs— sustained-load outage harness. Holds steady request load against the LB and asserts ≥99% availability while each replica isdocker compose stop/start’ed in turn. The driver explicitly scopes coverage to outage retry on stable container identities —--force-recreaterolling-replacement semantics are out of scope and stay in the documented “Still deferred” subsection..github/workflows/ha-nightly.ymladds two parallel jobs mirroring the existingha-check-2job structure (image build with shared GHA cache, ephemeral secrets, stack boot, per-replica readiness probe, log upload, teardown).docs/ha-reference-architecture.mdValidation Checklist now marks each item as automated vs manual; Item 7 is retitled “outage failover” with a scope-honest non-coverage note; a new “Still deferred” subsection lists rolling-replacement, Redis/rate-limit, and multi-instance asset/S3.- New
package.jsonscripts:ha:check-6,ha:check-7.
Error-contract normalization (section 2.3, two commits)
- Commit A (
090cba6) — additive enrichment sweep. New helper moduleapps/api/src/lib/error-codes.ts(50-entryERROR_CODESregistry,isErrorCode()runtime guard) +apps/api/src/lib/error-envelope.ts(buildErrorEnvelopegeneric helper +buildRenderErrorEnvelopethin wrapper preservinglocation/suggestion). Migrated ~258 emitter sites across 28 files (central handler, auth/tenancy plugins, every route file underadmin/,audit-events,auth/,nodes/,render/,schedules/,templates/,sandbox/,usage/,assets/). Existingerrorstrings preserved verbatim; onlycode/requestIdare net-new on the wire. Permissive schema declarations on the four shared schemas + the local DLQ +ServiceUnavailableschemas sofast-json-stringifydoes not strip the new fields. Spec-drift fixes pulled forward where touched (DLQ schemas +ServiceUnavailableSchemadeclarerequestId;templates/index.ts429 slot switched fromErrorResponseSchematoRateLimitErrorResponseSchema;schedules.routes.ts9 response blocks backfilled with the 503/409 statuses they actually emit). - Commit B (
cfc6b17) — schema tightening + closed-enum + SDK pass. NewErrorCodeSchema = Type.Union(ERROR_CODES.map(c => Type.Literal(c)))derived from the registry — single source of truth for OpenAPI, runtime serializer, and SDK codegen.RequestIdFieldSchemalifted from optional to required.codefield onErrorResponseSchema,ValidationErrorResponseSchema,RateLimitErrorResponseSchema,RenderErrorResponseSchemawidened from looseType.Optional(Type.String())to requiredErrorCodeSchema. All four shared error schemas now requirerequestId. LocalServiceUnavailableSchemaand DLQ schemas updated to match.openapi.jsonregenerated (closed-enum inlined at every error response slot).@pulp-engine/sdkPulpEngineErrorgains arequestId: string | undefinedfield;toString()includescodeandrequestIdso support workflows can capture the request id from a stack trace;pulp-engine(Python)PulpEngineErrorgainsrequest_id: str | None(snake_case Python convention; wire field staysrequestId);__repr__surfaces it. - Cross-route contract guards at
apps/api/src/__tests__/error-contract.test.ts: every 4xx/5xx response schema in the OpenAPI spec that emits an error envelope must declarecodeandrequestIdas required (skipping non-envelope shapes like/health/ready’s health-probe schema); every literal that appears as acodeenum value across the spec is inERROR_CODES. Catches drift in either direction. docs/api-errors.mdpins the contract: universal vs conditional fields, all four envelope shapes with worked examples, full registry table, SDK examples in TS and Python, operational guidance for usingrequestIdin support workflows.
Release-tooling fixes (carrying over from [Unreleased])
- TypeScript SDK publishing actually attempts npm Trusted Publishing before falling back to
NPM_TOKEN. The workflow packs workspace-aware tarballs withpnpm pack, publishes those exact bytes withnpm publish --provenance, and only retries withNPM_TOKENif OIDC publish fails. - Manual SDK publish reruns build the requested tag commit, not the branch head that launched
workflow_dispatch. Both SDK workflows now check out the resolved release SHA in their build jobs. - Trusted-publisher setup docs use the real repository slug (
TroyCoderBoy/pulpengine). The Python and TypeScript SDK runbooks/comments previously saidpulp-engine, which would produceinvalid-publisherfailures on PyPI or npm if copied into the registry-side trusted-publisher settings.
Pre-tag dependency-advisory bumps
Caught by ci.yml’s pnpm audit --prod --audit-level=moderate gate on the first release-prep SHA (977fccf); patched on the fix-forward commit before tagging.
basic-ftp >= 5.3.1— rootpnpm.overridesbumped from>=5.3.0. GHSA-rpmf-866q-6p89 — DoS via unbounded multiline FTP control-response buffering. Transitive viapuppeteer > @puppeteer/browsers > proxy-agent > pac-proxy-agent > get-uri > basic-ftp.ip-address >= 10.1.1— new entry in rootpnpm.overrides. GHSA-v2v4-37r5-5v8g — XSS in Address6 HTML-emitting methods. Transitive viapuppeteer > @puppeteer/browsers > proxy-agent > socks-proxy-agent > socks > ip-address. Functionally inert here (the renderer never invokes Address6 HTML serialization), but the audit gate is right to flag it.@anthropic-ai/sdk 0.95.0— direct dep inapps/api/package.jsonbumped from0.87.0. GHSA-p7fg-763f-g4gf — insecure default file permissions in the SDK’s Local Filesystem Memory Tool, a feature this codebase does not exercise (AI generation routes useclient.messages.create()exclusively). Bump is to clear the audit signal, not to address an exploitable surface.
Operational posture
- v0.78.0 is fully additive on the wire for existing SDK consumers. Pre-v0.78.0 SDKs still parse v0.78.0 server responses correctly — they ignore the new
code/requestIdfields. v0.78.0+ SDKs gracefully tolerate pre-v0.78.0 servers —codeandrequestIdsurface asundefined/None. - Recommended SDK upgrade for support workflows: capture
err.requestId(TS) /err.request_id(Python) when filing tickets — it pairs the body 1:1 with the matching server log entry’sreqIdand the response’sX-Request-IDheader. - The publish-gate dialog now reports per-format coverage honestly. Operators upgrading from
v0.77.xwill see “PPTX was not validated (disabled on this server)” instead of a misleading “All checks passed” whenPPTX_ENABLED=false. - HA Checks 6 and 7 run on the existing
ha-nightly.ymlschedule. The total nightly runtime budget is now ~25 min (Check 7’s load step is the longest single step; Checks 2/6 are quicker).
Verified before tagging
Locally verified
pnpm --filter @pulp-engine/api typecheck— clean.pnpm --filter @pulp-engine/sdk typecheck— clean.pnpm extract-openapi -- --check— clean (spec matches the source schemas).pnpm --filter @pulp-engine/api test:file— 1237 passed / 97 skipped / 0 failed on the file-mode suite.pnpm --filter @pulp-engine/sdk test— TypeScript SDK smoke 8/8 passed, including the new requestId enrichment guard against a real server boot.python -m pytest packages/sdk-python/tests/test_errors.py— Python SDK error tests 8/8 passed (including two newrequest_idcases onbuild_error()).node scripts/check-version.mjs— green on the prepared release commit (CI opt-in mode for the pre-tag head).
CI-verified
The release-prep SHA was iterated three times before reaching CI green; the SDK registry-publish workflows on the post-tag side are tracked separately under “Known residual” below.
ci.yml on the tagged SHA 2c3d447 (run 25465600701) — all 9 jobs green:
| Job | Status |
|---|---|
ci (lint + build + typecheck) | ✅ |
test-file-mode | ✅ |
test-sqlserver | ✅ |
test-e2e | ✅ |
test-e2e-auth | ✅ |
Docker build + smoke test | ✅ |
CI — Windows (file-mode + installer smoke) | ✅ |
Windows installer validation | ✅ |
Evaluation bundle validation | ✅ |
Release workflow on v0.78.0 push (run 25467053341) — all 7 jobs green:
| Job | Status |
|---|---|
CI gate — verify CI passed for release commit | ✅ |
docker (GHCR push) | ✅ |
eval-bundle | ✅ |
windows-installer | ✅ |
scan (Trivy / supply-chain) | ✅ |
release (GitHub Release publish) | ✅ |
Mirror Windows installer to public repo | ✅ |
Release-prep iteration history:
| SHA | CI conclusion | Reason |
|---|---|---|
977fccf | failure | pnpm audit --prod --audit-level=moderate flagged three advisories: basic-ftp ≤5.3.0 (GHSA-rpmf-866q-6p89), ip-address ≤10.1.0 (GHSA-v2v4-37r5-5v8g), @anthropic-ai/sdk 0.87.0 (GHSA-p7fg-763f-g4gf). Fixed in 258a3bf. |
258a3bf | failure (auto-cancelled by next push) | Audit gate cleared, but e2e regression in editor-workflows.spec.ts surfaced for the first time — section 2.1’s format-aware preflight had been bundled into the v0.78.0 push and never had its own CI run. Initial diagnosis (tableless fixture causing CSV/XLSX no_rendered_tables) was wrong; attempted fix in c05efce. |
c05efce | success (flake) | Added a stub table to the rich-text fixture on the wrong-diagnosis theory. CI happened to pass on this run — Playwright’s retries: 1 and capability-resolution timing make the underlying e2e race intermittent — but the fix did not address the actual cause, so it was not trusted as the tag target. |
2c3d447 | success | Stubbed /render/validate at the Playwright layer in both publish-flow e2e tests so the dialog reaches phase === 'ready' deterministically regardless of how format-aware preflight evaluates the rich-text fixture. The fixture-table change from c05efce is harmless and stays in the tagged tree. Tag target. |
HA nightly (Checks 2/6/7): intentionally run on the nightly schedule, not on the release-prep SHA. Their first observed-green run covers a wider time window than any individual release; that is the documented signal we wait for, not a release-blocking gate.
Not verified
- Registry publication (npm, PyPI), GHCR images, GitHub Release assets, public mirror sync, Windows installer smoke, and signed-licence end-to-end smoke remain tag-time/post-tag checks per
docs/release-checklist.md.
Known residual
pdf-transformmalformed-base64 catches are dead.Buffer.from(s, "base64")does not throw on malformed input — the four catch blocks atapps/api/src/routes/render/pdf-transform.tslines 124, 236, 248, and 347 cannot fire as the routes are written. Malformed payloads currently fall through to PDF parsing and surface asinvalid_pdf/unsupported_image422s rather than the intended generic 400. Behaviorally separate from the error-contract sweep; logged as its own follow-up.- Python SDK packaging-name mismatch. The PyPI package name is
pulp-enginebut the importable module name isdocuforge. The pre-v0.78.0 test infra hadfrom pulp-engine import ...(literal hyphen, invalid Python) at the top of every test file, which prevented the Python SDK test suite from loading. Section 2.3’s Commit B fixed only the two files needed to validate the error-contract changes (tests/conftest.py,tests/test_errors.py→from docuforge import ...). The broader rename —docuforge/→pulp_engine/matching the PyPI package name with hyphen→underscore convention — is its own follow-up.docs/api-errors.mddocuments the currentfrom docuforge import ...reality with a forward-looking note. - OpenAPI spec inlines the closed
ErrorCodeSchemaunion at every error response slot. Spec growth (~5x) is the cost of inlining; SDK codegen handles it without complaint. Compaction viaType.Ref()into a shared OpenAPI component is a separate refactor logged as a future spec-cleanliness pass — not a contract change. - HA scope. Checks 6 and 7 close the rotation + outage-failover slots. Rolling-replacement (
--force-recreatesemantics), Redis/rate-limit, and multi-instance asset/S3 read-after-write stay in the documented “Still deferred” batch; that’s a separate follow-up batch and was intentional Release 2 scope. - Trusted-publisher residual. PyPI Trusted Publishing may still fail loudly if the one-time PyPI trust configuration is incomplete; that is independent of this release. The intentionally-untracked Fly files (
fly.toml,.continue/) remain out of scope.