Release v0.84.0

Release v0.84.0 — “Operator Truth” (2026-06-10)

Publication note: v0.84.0 was tagged but never published — the release pipeline failed twice on runner disk exhaustion in the docker job’s SBOM steps before any artifact reached the GitHub Release, the public mirror, or the latest aliases. The tag is preserved at afd599f as unpublished (tags are immutable). The identical product shipped as v0.84.1, which adds only the release-workflow disk fix.

Audit remediation release 1 of 2, from the 2026-06-10 fresh-pass quality audit (repo at v0.83.0). This release closes all three of the audit’s release blockers plus four of its high-value findings, shipped as PRs #80–#85. The companion release (v0.85.0, “evidence & resilience”) covers the remaining high-value items: HA nightly Check 7, render saturation behaviour, container/socket isolation-mode CI execution, template format versioning, and the editor publish-gate seams.

The unifying theme: every operator-facing document now matches shipped behaviour, and the two correctness gaps that contradicted published guarantees (tenant-scoped composition, retention-bounded rendered output) are fixed in code with regression tests.

⚠ BREAKING changes — upgrade paths

1. Hardened file-mode requires an explicit async-batch durability choice (audit H9a)

BATCH_ASYNC_DURABILITY’s documented contract — required (the hardened default) refuses boot on any nondurable backend — was only enforced on SQL Server. Hardened STORAGE_MODE=file deployments silently ran nondurable async batch. The file branch now applies the same tri-state gate (FILE_BATCH_NONDURABLE).

Who is affected: deployments with NODE_ENV=production (or HARDEN_PRODUCTION=true) and STORAGE_MODE=file and no explicit BATCH_ASYNC_DURABILITY — they will refuse to boot after upgrading.

Upgrade path: add one env var before upgrading:

# accept in-memory async batch quietly:
BATCH_ASYNC_DURABILITY=allow-nondurable
# …or keep a startup reminder:
BATCH_ASYNC_DURABILITY=warn
# …or move to Postgres for durable async batch.

The README and deployment-guide hardened examples now include the line.

2. Hardened named-user gate covers JSON credentials (audit H2)

editorLoginCapable previously consulted only the legacy API_KEY_EDITOR/API_KEY_ADMIN/API_KEY env vars, so editor/admin credentials supplied via API_KEYS_JSON(_FILE) or API_KEY_SUPER_ADMIN bypassed the hardened named-user-or-ALLOW_SHARED_KEY_EDITOR enforcement entirely.

Who is affected: hardened deployments whose only editor/admin credentials come from API_KEYS_JSON/API_KEYS_JSON_FILE/API_KEY_SUPER_ADMIN with no named-user registry — they will refuse to boot after upgrading (which is exactly the conscious choice hardened mode was supposed to force).

Upgrade path: configure a named-user registry (EDITOR_USERS_JSON/_FILE/_DB) for per-user audit attribution (recommended), or set ALLOW_SHARED_KEY_EDITOR=true to explicitly accept shared-key identity.

Behavioural note (non-breaking)

Preview dry-run requests now perform tenant resolution before the dry-run branch (part of the B3 fix — dry-run resolves templateRef nodes from the store, so it must be tenant-scoped too). Single-tenant deployments are unaffected (resolveTenant returns default).

Release blockers closed

B1 — Backup/restore documentation failed as written (PR #83)

Four compounding defects: the runbook’s restore SQL used an unquoted hyphenated database name (invalid Postgres) plus an undefined env var; the CLI sections documented capabilities that do not exist (assets.tar.gz, binary checksums); the file-mode recipe silently lost all asset data (it never backed up ASSETS_DIR); and the deployment guide’s “recommended” 3-table pg_dump dropped 11 of 14 durable tables (schedules, tenants, audit history, editor users, labels, sample data, render usage, batch jobs, both DLQs).

All four are corrected, and the restore path is now continuously rehearsed: scripts/restore-rehearsal.sh runs the exact runbook sequence (seed via the real storage context → pg_dump → DROP/CREATE → pg_restore → prisma migrate deploy → per-table row-count diff) as the final step of the core CI job on every push to main.

B2 — Compliance doc misstated rendered-output persistence; restart-orphaned blobs (PRs #82, #84)

data-residency-gdpr.md affirmatively claimed render inputs/outputs are never persisted server-side — contradicted by async-batch result blobs (full rendered documents, gzipped), node-local scheduled-render artifacts, and Schedule rows persisting staticData plus recipient email addresses. The inventory, retention table, and erasure playbook now cover all three.

The code half: result blobs for jobs completed shortly before a restart were orphaned forever (completed jobs are never rehydrated, so the sweep’s blob-delete loop never saw them — retention did not actually bound rendered-output persistence). IJobResultBlobStore.list() plus a reconciliation pass (boot + hourly) now deletes rowless blobs older than retention. Safety: rows always precede blobs, young/unknown-age blobs are grace-skipped, deletes are idempotent. S3 deployments: the job-result blob-store credentials now also need ListBucket.

B3 — `templateRef` resolved templates from the wrong tenant (PR #81)

The composition resolver was bound once at route registration with tenantId='default'. Under MULTI_TENANT_ENABLED, a tenant-X render containing a templateRef node read the referenced template from the default tenant — and the anonymous public sandbox shared the same resolver. tenantId is now a required parameter, bound per request (render, validate, preview incl. dry-run), per batch item, per schedule execution (schedule.tenantId), and to SANDBOX_TENANT_ID in the sandbox. The check-tenant-propagation CI gate statically enforces the sites, and cross-tenant-leak.test.ts gains two templateRef red-line tests (same-key isolation; no fallback to default or any other tenant). docs/tenant-isolation-guarantees.md documents the new enforcement point and the pre-v0.84.0 behaviour.

High-value findings closed

H2 / H9a — the two hardening-truth gates above (PR #80).
H6 — operator docs-truth sweep (PR #84): the ALLOW_NO_AUTH first-run story corrected in all four first-touch docs (and .env.example no longer ships an uncommented empty API_KEY_ADMIN= that failed a verbatim first boot); SCHEDULE_ENABLED semantics corrected (CRUD vs execution split); ~60 missing env rows added to the deployment guide; the async batch API fully documented with auth-matrix rows; plus a new CI env-docs drift guard (scripts/check-env-docs.mjs, 148/148 schema keys documented).
H1 / H9b — evaluator link integrity + claim truth (PR #85): the docs pipeline no longer emits private-repo blob URLs (251 dead links degraded to plain text; verified zero pulpengine/blob URLs in the built site); the evaluator guide’s evidence list is split into externally-verifiable vs CI-verified-on-source; /docs/windows-installer hosts the installer known-issues content; SDK claims carry the “npm/PyPI publish pending” caveat (publication itself deferred by decision); XLSX/CSV are presented as table/pivot exports, not full-template “editable” outputs; all 51 CHANGELOG version anchors point at public release pages.

Validation

Locally verified (Windows dev box, clean-env recipe):

PR #80: the 4 affected suites — 172/172 tests; repo-wide typecheck green.
PR #81: cross-tenant red-line suite 8/8 against a disposable Postgres 16 (migrate deploy + run), including both new templateRef leak tests; 78/78 across the six affected render/preview/dry-run/batch/ schedule suites; check-tenant-propagation zero violations.
PR #82: 35/35 across job-store, blob-list, batch-async persistence and orphan-recovery suites, including the end-to-end restart-orphan-window regression.
PR #83: restore-rehearsal.sh run live against a disposable postgres:16 — 15 tables / 142 rows survived dump → drop → restore → migrate (the per-table diff also caught and killed a stdin-slurp bug in the docker-exec tool path during development, which is the gate working).
PR #84: check-env-docs green (148/148 keys).
PR #85: full website build — 251 links degraded at source, 0 post-build fallbacks, zero private blob URLs in dist/.

CI-verified: the push-gated full matrix on this release SHA — CI run 27252758423 (all 9 executed jobs green; the two PR smokes correctly skipped on push), including the first executions of the new restore-rehearsal step and the env-docs gate.

Not verified in this release: live HA evidence (HA nightly is schedule-disabled pending the v0.85.0 Check 7 fix — its availability assertion has been failing); container/socket isolation-mode execution in CI (v0.85.0); npm/PyPI SDK publication (deferred — claims softened instead).

Actions-budget notes

PR-time smokes for the six PRs were cancelled once the PRs merged; the superseded per-merge push runs were cancelled in favour of the single release-SHA matrix; and the HA nightly schedule is disabled (it had been failing its Check 7 availability assertion nightly since 2026-05-06, burning ~25–40 min/night) — re-enable with the v0.85.0 fix: gh workflow enable "HA nightly".

← Back to releases