Release v0.84.0 — “Operator Truth” (2026-06-10)
Publication note: v0.84.0 was tagged but never published — the release pipeline failed twice on runner disk exhaustion in the docker job’s SBOM steps before any artifact reached the GitHub Release, the public mirror, or the
latestaliases. The tag is preserved atafd599fas unpublished (tags are immutable). The identical product shipped as v0.84.1, which adds only the release-workflow disk fix.
Audit remediation release 1 of 2, from the 2026-06-10 fresh-pass quality audit (repo at v0.83.0). This release closes all three of the audit’s release blockers plus four of its high-value findings, shipped as PRs #80–#85. The companion release (v0.85.0, “evidence & resilience”) covers the remaining high-value items: HA nightly Check 7, render saturation behaviour, container/socket isolation-mode CI execution, template format versioning, and the editor publish-gate seams.
The unifying theme: every operator-facing document now matches shipped behaviour, and the two correctness gaps that contradicted published guarantees (tenant-scoped composition, retention-bounded rendered output) are fixed in code with regression tests.
⚠ BREAKING changes — upgrade paths
1. Hardened file-mode requires an explicit async-batch durability choice (audit H9a)
BATCH_ASYNC_DURABILITY’s documented contract — required (the hardened
default) refuses boot on any nondurable backend — was only enforced on SQL
Server. Hardened STORAGE_MODE=file deployments silently ran nondurable
async batch. The file branch now applies the same tri-state gate
(FILE_BATCH_NONDURABLE).
Who is affected: deployments with NODE_ENV=production (or
HARDEN_PRODUCTION=true) and STORAGE_MODE=file and no explicit
BATCH_ASYNC_DURABILITY — they will refuse to boot after upgrading.
Upgrade path: add one env var before upgrading:
# accept in-memory async batch quietly:
BATCH_ASYNC_DURABILITY=allow-nondurable
# …or keep a startup reminder:
BATCH_ASYNC_DURABILITY=warn
# …or move to Postgres for durable async batch.
The README and deployment-guide hardened examples now include the line.
2. Hardened named-user gate covers JSON credentials (audit H2)
editorLoginCapable previously consulted only the legacy
API_KEY_EDITOR/API_KEY_ADMIN/API_KEY env vars, so editor/admin
credentials supplied via API_KEYS_JSON(_FILE) or API_KEY_SUPER_ADMIN
bypassed the hardened named-user-or-ALLOW_SHARED_KEY_EDITOR enforcement
entirely.
Who is affected: hardened deployments whose only editor/admin
credentials come from API_KEYS_JSON/API_KEYS_JSON_FILE/API_KEY_SUPER_ADMIN
with no named-user registry — they will refuse to boot after upgrading
(which is exactly the conscious choice hardened mode was supposed to force).
Upgrade path: configure a named-user registry
(EDITOR_USERS_JSON/_FILE/_DB) for per-user audit attribution
(recommended), or set ALLOW_SHARED_KEY_EDITOR=true to explicitly accept
shared-key identity.
Behavioural note (non-breaking)
Preview dry-run requests now perform tenant resolution before the
dry-run branch (part of the B3 fix — dry-run resolves templateRef nodes
from the store, so it must be tenant-scoped too). Single-tenant deployments
are unaffected (resolveTenant returns default).
Release blockers closed
B1 — Backup/restore documentation failed as written (PR #83)
Four compounding defects: the runbook’s restore SQL used an unquoted
hyphenated database name (invalid Postgres) plus an undefined env var; the
CLI sections documented capabilities that do not exist (assets.tar.gz,
binary checksums); the file-mode recipe silently lost all asset data
(it never backed up ASSETS_DIR); and the deployment guide’s “recommended”
3-table pg_dump dropped 11 of 14 durable tables (schedules, tenants,
audit history, editor users, labels, sample data, render usage, batch jobs,
both DLQs).
All four are corrected, and the restore path is now continuously
rehearsed: scripts/restore-rehearsal.sh runs the exact runbook sequence
(seed via the real storage context → pg_dump → DROP/CREATE →
pg_restore → prisma migrate deploy → per-table row-count diff) as the
final step of the core CI job on every push to main.
B2 — Compliance doc misstated rendered-output persistence; restart-orphaned blobs (PRs #82, #84)
data-residency-gdpr.md affirmatively claimed render inputs/outputs are
never persisted server-side — contradicted by async-batch result blobs
(full rendered documents, gzipped), node-local scheduled-render artifacts,
and Schedule rows persisting staticData plus recipient email addresses.
The inventory, retention table, and erasure playbook now cover all three.
The code half: result blobs for jobs completed shortly before a restart
were orphaned forever (completed jobs are never rehydrated, so the
sweep’s blob-delete loop never saw them — retention did not actually bound
rendered-output persistence). IJobResultBlobStore.list() plus a
reconciliation pass (boot + hourly) now deletes rowless blobs older than
retention. Safety: rows always precede blobs, young/unknown-age blobs are
grace-skipped, deletes are idempotent. S3 deployments: the job-result
blob-store credentials now also need ListBucket.
B3 — templateRef resolved templates from the wrong tenant (PR #81)
The composition resolver was bound once at route registration with
tenantId='default'. Under MULTI_TENANT_ENABLED, a tenant-X render
containing a templateRef node read the referenced template from the
default tenant — and the anonymous public sandbox shared the same
resolver. tenantId is now a required parameter, bound per request (render,
validate, preview incl. dry-run), per batch item, per schedule execution
(schedule.tenantId), and to SANDBOX_TENANT_ID in the sandbox. The
check-tenant-propagation CI gate statically enforces the sites, and
cross-tenant-leak.test.ts gains two templateRef red-line tests
(same-key isolation; no fallback to default or any other tenant).
docs/tenant-isolation-guarantees.md documents the new enforcement point
and the pre-v0.84.0 behaviour.
High-value findings closed
- H2 / H9a — the two hardening-truth gates above (PR #80).
- H6 — operator docs-truth sweep (PR #84): the
ALLOW_NO_AUTHfirst-run story corrected in all four first-touch docs (and.env.exampleno longer ships an uncommented emptyAPI_KEY_ADMIN=that failed a verbatim first boot);SCHEDULE_ENABLEDsemantics corrected (CRUD vs execution split); ~60 missing env rows added to the deployment guide; the async batch API fully documented with auth-matrix rows; plus a new CI env-docs drift guard (scripts/check-env-docs.mjs, 148/148 schema keys documented). - H1 / H9b — evaluator link integrity + claim truth (PR #85): the docs
pipeline no longer emits private-repo blob URLs (251 dead links degraded
to plain text; verified zero
pulpengine/blobURLs in the built site); the evaluator guide’s evidence list is split into externally-verifiable vs CI-verified-on-source;/docs/windows-installerhosts the installer known-issues content; SDK claims carry the “npm/PyPI publish pending” caveat (publication itself deferred by decision); XLSX/CSV are presented as table/pivot exports, not full-template “editable” outputs; all 51 CHANGELOG version anchors point at public release pages.
Validation
Locally verified (Windows dev box, clean-env recipe):
- PR #80: the 4 affected suites — 172/172 tests; repo-wide typecheck green.
- PR #81: cross-tenant red-line suite 8/8 against a disposable
Postgres 16 (migrate deploy + run), including both new
templateRefleak tests; 78/78 across the six affected render/preview/dry-run/batch/ schedule suites;check-tenant-propagationzero violations. - PR #82: 35/35 across job-store, blob-list, batch-async persistence and orphan-recovery suites, including the end-to-end restart-orphan-window regression.
- PR #83:
restore-rehearsal.shrun live against a disposable postgres:16 — 15 tables / 142 rows survived dump → drop → restore → migrate (the per-table diff also caught and killed a stdin-slurp bug in the docker-exec tool path during development, which is the gate working). - PR #84:
check-env-docsgreen (148/148 keys). - PR #85: full website build — 251 links degraded at source, 0 post-build
fallbacks, zero private blob URLs in
dist/.
CI-verified: the push-gated full matrix on this release SHA — CI run 27252758423 (all 9 executed jobs green; the two PR smokes correctly skipped on push), including the first executions of the new restore-rehearsal step and the env-docs gate.
Not verified in this release: live HA evidence (HA nightly is schedule-disabled pending the v0.85.0 Check 7 fix — its availability assertion has been failing); container/socket isolation-mode execution in CI (v0.85.0); npm/PyPI SDK publication (deferred — claims softened instead).
Actions-budget notes
PR-time smokes for the six PRs were cancelled once the PRs merged; the
superseded per-merge push runs were cancelled in favour of the single
release-SHA matrix; and the HA nightly schedule is disabled (it had
been failing its Check 7 availability assertion nightly since 2026-05-06,
burning ~25–40 min/night) — re-enable with the v0.85.0 fix:
gh workflow enable "HA nightly".