Release v0.67.0

Date: 2026-04-12

Theme

Phase C.0 Stage 2 — Multi-tenant mode.

This release lights up the tenant primitive that shipped behind the scenes in v0.66.0. An operator running Postgres can now set MULTI_TENANT_ENABLED=true, provision a super-admin credential, create real tenants via the new admin API, issue tenant-bound keys via API_KEYS_JSON, mint editor tokens carrying a signed tenant claim, archive a tenant, and have every data-access request correctly isolated at the store layer.

Zero-change upgrade for single-tenant deployments. The entire multi-tenant path is behind a single feature flag. Leave MULTI_TENANT_ENABLED unset (or false), restart, and the server behaves exactly like v0.66.0 — the 59-site tenant coercion rewrite preserves stage 1 semantics because resolveTenant() short-circuits to 'default' in single-tenant mode.

At a glance

Area	What shipped	New env / API surface
Feature flag	Master `MULTI_TENANT_ENABLED` gate	`MULTI_TENANT_ENABLED` (default false)
Credentials	Rich credential map with tenant binding	`API_KEYS_JSON`, `API_KEYS_JSON_FILE`, `API_KEY_SUPER_ADMIN`
Tenant CRUD	Super-admin-only admin routes	`POST/GET/PATCH /admin/tenants`, `/:id/archive`, `/:id/unarchive`, `DELETE/:id?purge=true` (→501)
Editor tokens	5-part format with signed tenant claim	`EditorTokenResponse.tenantId` field
OIDC	Default tenant for OIDC sessions	`OIDC_DEFAULT_TENANT` (default `'default'`)
Archive lifecycle	Soft-archive with 409 on writes	`TENANT_STATUS_CACHE_TTL_MS` (default 10s), `tenant_archive_rejections_total` metric
Plugin system	Reject identity/storage plugins in MT mode	`plugin-api` 0.1.0 → 0.2.0 (render-hook context optional `tenantId`)
Binary store	Non-default tenants get `{tenantId}/{uuid}` prefix	`/assets/*` wildcard route
Error wire	`code: 'tenant_unknown' / 'tenant_archived'`	Optional `ErrorResponseSchema.code` field
Grep gate	Ban `?? 'default'` in route/lib; allowlist shrunk	`scripts/check-tenant-propagation.mjs`

What changed

Credential shape and super-admin scope

The credential map at auth.plugin.ts grew from Map<string, Scope> to Map<string, { scope, tenantId: string | null }>. Three env-var sources fill it:

Legacy env vars (API_KEY_ADMIN, API_KEY_RENDER, API_KEY_PREVIEW, API_KEY_EDITOR) always bind to tenantId: 'default'. No change in behavior for single-tenant operators.
API_KEYS_JSON / API_KEYS_JSON_FILE — JSON array of [{ key, scope, tenantId }] where tenantId: string | null. Parsing produces redacted errors that never echo the key value. Mutually exclusive sources.
API_KEY_SUPER_ADMIN — single-string shortcut for { scope: 'admin', tenantId: null }. Super-admin credentials are the only ones allowed to operate on /admin/tenants/*. A tenant-bound admin credential hitting those routes gets 403 super_admin_only.

A super-admin credential operating on a data-access route (templates, assets, schedules, render, audit-events, schedule-dlq) must supply an X-PulpEngine-Tenant-Id header — otherwise resolveTenant() returns 400 tenant_required. The header is trimmed, lowercased, and validated against the slug regex before being passed to the store.

`resolveTenant` — the single legal coercion site

Stage 1 had 59 request.tenantId ?? 'default' sites across 12 files. Every one of them was a cross-tenant leak waiting to happen as soon as a super-admin credential (with tenantId: null) hit the codebase. Stage 2 introduces one helper at apps/api/src/lib/tenant-resolution.ts:

export async function resolveTenant(
  request: FastifyRequest,
  reply: FastifyReply,
): Promise<string | null>

The mechanical rewrite replaced every call site:

// Before
const tenantId = request.tenantId ?? 'default'
const result = await service.getByKey(tenantId, key)

// After
const tenantId = await resolveTenant(request, reply)
if (tenantId === null) return
const result = await service.getByKey(tenantId, key)

The helper handles the single-tenant default, the tenant-bound credential path, and the super-admin header path uniformly. For super-admin headers it runs assertKnown(headerValue) so a typo’d or deleted tenant fails with 403 tenant_unknown before any store call.

The grep gate now bans ?? 'default' in route/lib files (except the provider trio, tenant-resolution.ts, super-admin.ts, and a small allowlist of boundary files). Inline tenant-propagation-allow-default directives are the only remaining escape hatch, and each one requires a reviewer-visible comment.

Editor token 5-part format (B1 blocker — verify-first)

The stage 1 verifyEditorToken had switch (parts.length) { case 2|3|4: ...; default: return null }. A 5-part token was unforgeably rejected. Stage 2 adds case 5: BEFORE touching any mint-side code so partial rollouts don’t create an outage window.

New format: {iat}.{expiry}.{tenantId_b64url}.{actor_b64url}.{sig} HMAC payload: editor:${iat}:${expiry}:${tenantId}:${actorRaw}

Backward-compat: 3-part and 4-part tokens still verify; they resolve to tenantId: 'default'. No forced re-auth on upgrade.

extractTokenActor() — the other load-bearing edit — is the pre-verification parse helper used by the named-user auth path. Stage 1 hardcoded parts.length !== 4 → without the fix, every named-user 5-part token was rejected at the first step of auth before verifyEditorToken ran. Stage 2 supports both 4-part and 5-part via parts.length - 2 indexing (the actor slot is always the second-to-last segment).

The mintEditorToken signature grew a required tenantId: string parameter. All four mint call sites were updated: auth.ts named-users, auth.ts shared-key + super-admin, oidc.ts callback, oidc.ts exchange. Every mint path runs await app.tenantStatusCache.assertKnown(mintTenantId) immediately before the mint — the universal “last gate before mint” rule.

`/auth/editor-token` super-admin mint flow

/auth/* routes bypass the onRequest hook, so the auth-hook assertKnown check does NOT fire for mint routes. Stage 2 adds explicit assertKnown calls at every mint path.

The super-admin mint flow on /auth/editor-token:

Branch	Behavior
Tenant-bound credential + header present	400 — header not accepted when credential is tenant-bound
Tenant-bound credential + no header	`assertKnown(entry.tenantId)` → mint with that tenant
Super-admin credential + header missing	400 `tenant_required`
Super-admin credential + header slug-invalid	400 malformed-slug
Super-admin credential + valid header + unknown tenant	403 `tenant_unknown` (via global error handler)
Super-admin credential + valid header + active/archived tenant	mint with that tenant

Archived tenants mint successfully. A freshly-minted session on an archived tenant succeeds on read routes (per the stage 1 soft-archive rule) and fails with 409 tenant_archived at the store-write boundary. This is deliberate — blocking mint on archive would contradict the audit/export access path.

OIDC via completion codes + default tenant

The repo uses oidc_code= query-param completion codes, not redirect fragments. Stage 2 threads tenantId through the flow concretely:

CompletionTokenData at lib/oidc/completion-codes.ts grows a required tenantId: string field. Every completionCodeStore.create({...}) call site passes the resolved tenantId.
/oidc/complete response body carries tenantId through to the editor.
/oidc/exchange response body carries tenantId.
OIDC_DEFAULT_TENANT env var (default 'default') — auto-provisioned OIDC users pick this up as their tenantId. For stored users, user.tenantId wins. Per-group tenant binding deferred to C.0b.
Startup validation: if OIDC_ENABLED && MULTI_TENANT_ENABLED, the configured default tenant must exist and not be archived — throws at boot.
Mint-time recheck: the OIDC callback path runs assertKnown(mintTenantId) before the mint. Failure surfaces via the errorPage helper as an HTML page (not JSON) because the callback returns HTML.

`TenantStatusCache`: archive enforcement without Prisma middleware

The earlier-draft plan used a Prisma extension for archive enforcement. That approach hit two dead-ends:

Schedule engine + audit purge writes happen outside any HTTP request context, so a per-request cache can’t see them.
Prisma’s legacy $use middleware doesn’t fire reliably inside $transaction(...) callbacks (hazard B3).

Stage 2 drops the extension entirely. Every Postgres store write method calls await this.tenantStatusCache?.assertActive(tenantId, operation) as its first line, BEFORE any $transaction begins. The cache is a process-scoped Map<tenantId, {status, expiresAt}> with a 10-second default TTL (tunable via TENANT_STATUS_CACHE_TTL_MS). One DB query per cache miss. The shared instance is injected into both the Postgres stores and the schedule engine so non-HTTP paths hit the same cache.

Two distinct guards:

assertKnown(tenantId) — auth-boundary. Throws TenantUnknownError → 403 tenant_unknown on unknown tenants. Archived tenants PASS — reads on archived tenants are allowed per stage 1 soft-archive.
assertActive(tenantId, operation) — store-write-boundary. Throws TenantUnknownError → 403 on unknown (defensive), TenantArchivedError → 409 tenant_archived on archived. Increments tenant_archive_rejections_total{operation}.

Terminal / observability writes intentionally skip the guard: audit.record, dlq.insert, dlq.markAbandoned, dlq.markOrphaned, execution.updateStatus. Archived tenants still get audit rows and DLQ terminal transitions — otherwise an archive mid-execution would leave orphan rows forever.

Schedule engine dispatch-time recheck: after findDueSchedules returns (which now JOINs tenants and excludes archived ones), the engine re-checks the cache per schedule before inserting an execution row. An archive happening between the query and the per-schedule dispatch still blocks in-flight work.

Cache invalidation: POST /admin/tenants, /:id/archive, /:id/unarchive all bust the relevant cache entry so subsequent requests observe the fresh state immediately on this pod. Other pods observe the change within TTL — documented as the stale window.

Wire contract for `tenant_unknown` / `tenant_archived`

Stage 2 adds an optional code field to ErrorResponseSchema. Existing clients that parse only error + message continue to work; SDKs and tests that need precise classification read code.

// tenant_unknown — 403 Forbidden
{ "error": "Forbidden", "code": "tenant_unknown", "message": "Tenant \"future\" is not known to this server." }

// tenant_archived — 409 Conflict
{ "error": "Conflict", "code": "tenant_archived", "message": "Tenant \"acme\" is archived and rejects new writes. Reads still work." }

The auth-hook catch path at auth.plugin.ts:450-453 is rewritten to derive the error label from err.statusCode (403 → Forbidden, 401 → Unauthorized) and pass through the optional code. The global error-handler in error-handler.plugin.ts gains parallel branches for AuthError (and its TenantUnknownError subclass) and TenantArchivedError so non-auth-hook throw sites (resolveTenant inside routes, /auth/editor-token mint) produce identical wire bodies. Defense-in-depth: both paths converge.

Metrics discipline:

auth_failures_total{reason="tenant_unknown"} — new reason bucket (one additional label cardinality)
tenant_archive_rejections_total{operation} — new counter with enumerated low-cardinality operation label (18 values matching the tenant-scoped store write methods). tenantId is NOT a metric label — it’s unbounded by Prometheus standards. tenantId appears on the structured log line only.

Binary store tenant-prefixing

Non-default tenants produce tenant-prefixed filenames at upload time: postgres-asset.store.ts generates {tenantId}/{uuid}.{ext} instead of the flat {uuid}.{ext}. IAssetBinaryStore stays tenant-agnostic — it sees the opaque filename and never learns the tenant directly.

Two knock-on fixes:

FsAssetBinaryStore.save() adds fs.mkdirSync(path.dirname(target), { recursive: true }) before the write. Without this, the first upload for any new tenant would throw ENOENT because the {tenantId}/ subdirectory doesn’t exist yet.
Private-mode asset proxy route changes from /assets/:filename to /assets/* so tenant-prefixed filenames match as a two-segment wildcard. Handler reads request.params['*'] and relaxes the path-traversal guard to allow forward slashes (but still reject .. segments and leading dots). OpenAPI param name changes from filename to * — downstream SDK regeneration is batched after stage 2.

The public-mode static mount stays on assetsDir root and serves tenant-prefixed filenames directly via /assets/acme/uuid.png. Filenames stay globally unique via the UUID component so cross-tenant collisions are impossible.

`asset-inline.ts` regex update

The stage 1 regex /src=(["'])\/assets\/([^"'?#/\\]+)\1/g explicitly excluded forward slashes — a tenant-prefixed reference like <img src="/assets/acme/uuid.png"> never matched and never got inlined. Private-mode HTML and PDF renders for non-default tenants would have broken silently. Stage 2 removes the / exclusion; the capture now accepts multi-segment filenames. Traversal protection moves to the tenant-scoped binary store wrapper’s metadata lookup.

Plugin system rejection policy

Under MULTI_TENANT_ENABLED=true, plugin identity providers and plugin storage backends are rejected at registration time:

[Plugin <name>] identity provider 'X' cannot be registered in multi-tenant mode.
The plugin-api 0.2.x contract does not carry a tenant binding...

Plugin render hooks and custom renderers load with a single-shot warning logged at activation time — they’re told to read ctx.tenantId if they need per-tenant behavior. The plugin bridge shims at plugin-system.plugin.ts:161-171 keep their hardcoded 'default' because under multi-tenant mode the plugin never activates (the rejection above fires first); under single-tenant mode they work as before.

@pulp-engine/plugin-api bumps 0.1.0 → 0.2.0 — PreRenderHookContext and PostRenderHookContext gain an optional readonly tenantId?: string field. Optional (not required) preserves source compatibility with 0.1.x plugins that construct contexts as object literals in their test suites; the RenderHookRunner fills 'default' when the field is absent.

Editor side: 7 `setStoredToken` callers + embed protocol + silent callback

setStoredToken(token, expiresAt, actor, displayName, scope, tenantId) gained a 6th argument. The editor’s session-storage layer adds a pulp-engine.editorTenantId key and a getStoredTenant() helper mirroring getStoredScope(). All seven call sites were updated:

#	File	Trigger
1	`apps/editor/src/components/auth/LoginGate.tsx:129`	Shared-key login form submit
2	`apps/editor/src/embed-main.tsx:44`	Host posts pre-minted `msg.token` in embed init
3	`apps/editor/src/embed-main.tsx:48`	Host posts `msg.oidcToken` → `/oidc/exchange`
4	`apps/editor/src/embed-main.tsx:57`	Host posts `msg.apiKey` → `/auth/editor-token`
5	`apps/editor/src/embed-main.tsx:121`	Ongoing `pulp-engine:token-refresh` host message
6	`apps/editor/src/hooks/use-token-refresh.ts:80`	Silent iframe refresh response
7	`apps/editor/src/lib/auth.ts:222`	Internal `exchangeOidcCode()` caller

Two additional files were updated that don’t call setStoredToken but are part of the embed / refresh plumbing:

apps/editor/src/embed/post-message-protocol.ts — InboundMessage init shape and TokenRefreshMessage shape both grow an optional tenantId?: string. These are the TypeScript types host integrations compile against.
apps/editor/public/oidc-silent-callback.html — static HTML outside tsc and lint. Its postMessage payload at lines 28-36 now forwards tenantId: data.tenantId from the /oidc/complete response body. Without this update, silent refresh would reset the editor’s stored tenantId to 'default' on every refresh cycle, silently downgrading multi-tenant sessions.

Migration notes

Single-tenant upgrade (the common case)

Do nothing. Upgrade to v0.67.0, redeploy, and everything behaves exactly like v0.66.0. The 59-site tenant coercion rewrite preserves stage 1 semantics because resolveTenant() short-circuits to 'default' in single-tenant mode. Existing 3-part and 4-part editor tokens continue to verify — they resolve to tenantId: 'default'.

Rolling out multi-tenant mode

Upgrade to v0.67.0 WITHOUT MULTI_TENANT_ENABLED. Deploy and verify single-tenant still works.
Set MULTI_TENANT_ENABLED=true + API_KEY_SUPER_ADMIN=<key> + restart. Legacy API_KEY_* credentials continue to work — they all bind to 'default'.
Create your first non-default tenant via POST /admin/tenants {"id":"acme","name":"Acme Corp"} using the super-admin key.
Add API_KEYS_JSON with a tenant-bound entry for that tenant. Restart.
Verify isolation — a request with the acme-bound key should only see acme’s data; a request with API_KEY_ADMIN should only see default’s data; the super-admin key must pass X-PulpEngine-Tenant-Id on every data-access request.

Rollback

Flip MULTI_TENANT_ENABLED=false and restart. Behavior reverts to stage 1 immediately. No data migration, no schema rollback — stage 2 adds no new Prisma migrations. Non-default tenant rows in the tenants table stay dormant; tenant-prefixed filenames in assets/<tenantId>/... stay on disk.

Application-level rollback to v0.66.0 is also safe in pure single-tenant mode (no non-default tenants ever created). If non-default tenants have been used, the pre-v0.67.0 code will read tenant-prefixed filenames as flat filenames — the asset fetches will miss. Recommended: stay on v0.67.0 and toggle the flag.

Verification

cd apps/api && pnpm exec tsc --noEmit — zero errors
cd apps/editor && pnpm exec tsc --noEmit — zero errors
cd packages/plugin-api && pnpm build — zero errors, outputs 0.2.0 types
node scripts/check-tenant-propagation.mjs — clean under strict multi-tenant-mode rule
node scripts/check-template-resolution.mjs — clean
pnpm --filter @pulp-engine/api test — 975 passing, pre-existing Windows Fastify-startup 10-15s timeouts documented in project_flaky_tests_v1.md (all verified to pass in isolation)
node scripts/check-version.mjs — 9 lockstep files aligned at 0.67.0

Follow-ups for C.0b / C.1 / C.2

Follow-up	Notes
C.0b SQL Server multi-tenant	Requires ITenantStore implementation against raw mssql + archive guards in SqlServerTemplateStore/AssetStore/AuditStore. Postgres ships first as the reference.
C.0b plugin-api tenant-aware contracts	Semver-major: `PluginIdentityProvider.tenantId`, `PluginTemplateStore.list({tenantId})`, etc. Requires a codemod release for plugin authors.
C.0b per-group OIDC tenant binding	`OIDC_TENANT_GROUP_MAPPING_JSON` or equivalent.
C.0b pre-activation plugin rejection via manifest capabilities field	More informative than registration-time rejection.
C.0b startup audit loop iterating all tenants	Currently hardcoded to `'default'` for the legacy-SVG scan.
C.0b Postgres RLS as second-layer defense	Explicit store-layer guards are the primary gate.
C.0b `UserTenantMembership` table	Cross-tenant users (consultant case). Stage 2 keeps globally-unique user IDs.
C.0b `ApiCredential` CRUD table	Env-var only for stage 2.
C.1 per-tenant schedule engine sharding	Stage 2 ships one global engine with archive filter + dispatch recheck.
C.1 tenant usage analytics	`RenderUsage` table, `GET /usage` route.
C.2 per-tenant rate limits	`@fastify/rate-limit` keyGenerator using `(tenantId
SDK regeneration	Python/.NET/Go/Java — the `/assets/*` wildcard and `tenantId` on editor-token responses both require codegen.
Tenant hard-delete / purge	Stage 2 returns 501; cascade policy requires design work.
Execution-state granular archive policy	Currently `updateStatus` skips the archive guard entirely so in-flight executions can reach terminal state. A future release may add per-terminal-state guards using the `TenantArchiveOperation` label already present in the metrics enum.

Hazard register (closed in stage 2)

Every hazard from the planning stress-test is closed concretely:

B1: verifyEditorToken case 5: added FIRST; extractTokenActor supports both 4-part and 5-part
B2: 59 ?? 'default' sites rewritten to resolveTenant + grep gate ban
B3: Prisma extension dropped entirely; explicit assertActive guards run BEFORE $transaction
B4: /assets/* wildcard + globally-unique UUID filenames preserve public-mode compatibility
B5: FsAssetBinaryStore.save() adds mkdirSync for tenant subdirectories
H1: no Prisma extension, no recursion problem
H2: per-request cache design dropped in favor of process-scoped TTL cache; stale window documented
H3: dispatch-time recheck in schedule engine catches mid-tick archives
H4: tenantId is OPTIONAL on hook contexts (semver-minor)
H5: startup validation warns, not throws, for API_KEYS_JSON referencing unknown tenants
H6: redacted parse errors; API_KEYS_JSON_FILE alternative
H7: tenantStore.* exempt from grep-gate store-call check
R1: OIDC via completion codes (no fragment parser); CompletionTokenData.tenantId required
R2: asset-inline.ts regex updated + regression test
R3: tenant CRUD wiring traced through storage-factory → storage.plugin → server.ts explicitly
R4: explicit store-layer guards (no Prisma middleware)
R5: SQL Server + multi-tenant rejected at startup alongside file mode
R6: render.ts ?? 'default' carveout removed; hook contexts use resolved variable
R7: /auth/editor-token super-admin branch documented with explicit flow
R8: extractTokenActor() supports both 4-part and 5-part
R9: TenantStatusCache.assertKnown() covers auth hook + resolveTenant + all 4 mint paths
R10: 7 setStoredToken callers enumerated and updated; embed protocol types updated; silent-callback HTML updated
R11: apps/editor/src/embed/post-message-protocol.ts + public/oidc-silent-callback.html listed explicitly
R12: tenant_unknown wire contract specified (status, body shape, code field); tenant_archived parallel
R13: auth-hook catch path rewrite stated explicitly with sample code
R14: universal “last gate before mint” rule covers all 4 mint paths
R15: tenant_archive_rejections_total{operation} counter — NOT under auth_failures_total
R16: archived tenants mint successfully (reads allowed); only unknown tenants 403 at mint
R17: global error-handler branch covers non-auth-hook throw sites
R18: resolveTenant is async, every call site uses await

← Back to releases