Release v0.67.0
Date: 2026-04-12
Theme
Phase C.0 Stage 2 — Multi-tenant mode.
This release lights up the tenant primitive that shipped behind the scenes in v0.66.0. An operator running Postgres can now set MULTI_TENANT_ENABLED=true, provision a super-admin credential, create real tenants via the new admin API, issue tenant-bound keys via API_KEYS_JSON, mint editor tokens carrying a signed tenant claim, archive a tenant, and have every data-access request correctly isolated at the store layer.
Zero-change upgrade for single-tenant deployments. The entire multi-tenant path is behind a single feature flag. Leave MULTI_TENANT_ENABLED unset (or false), restart, and the server behaves exactly like v0.66.0 — the 59-site tenant coercion rewrite preserves stage 1 semantics because resolveTenant() short-circuits to 'default' in single-tenant mode.
At a glance
| Area | What shipped | New env / API surface |
|---|---|---|
| Feature flag | Master MULTI_TENANT_ENABLED gate | MULTI_TENANT_ENABLED (default false) |
| Credentials | Rich credential map with tenant binding | API_KEYS_JSON, API_KEYS_JSON_FILE, API_KEY_SUPER_ADMIN |
| Tenant CRUD | Super-admin-only admin routes | POST/GET/PATCH /admin/tenants, /:id/archive, /:id/unarchive, DELETE/:id?purge=true (→501) |
| Editor tokens | 5-part format with signed tenant claim | EditorTokenResponse.tenantId field |
| OIDC | Default tenant for OIDC sessions | OIDC_DEFAULT_TENANT (default 'default') |
| Archive lifecycle | Soft-archive with 409 on writes | TENANT_STATUS_CACHE_TTL_MS (default 10s), tenant_archive_rejections_total metric |
| Plugin system | Reject identity/storage plugins in MT mode | plugin-api 0.1.0 → 0.2.0 (render-hook context optional tenantId) |
| Binary store | Non-default tenants get {tenantId}/{uuid} prefix | /assets/* wildcard route |
| Error wire | code: 'tenant_unknown' / 'tenant_archived' | Optional ErrorResponseSchema.code field |
| Grep gate | Ban ?? 'default' in route/lib; allowlist shrunk | scripts/check-tenant-propagation.mjs |
What changed
Credential shape and super-admin scope
The credential map at auth.plugin.ts grew from Map<string, Scope> to Map<string, { scope, tenantId: string | null }>. Three env-var sources fill it:
- Legacy env vars (
API_KEY_ADMIN,API_KEY_RENDER,API_KEY_PREVIEW,API_KEY_EDITOR) always bind totenantId: 'default'. No change in behavior for single-tenant operators. API_KEYS_JSON/API_KEYS_JSON_FILE— JSON array of[{ key, scope, tenantId }]wheretenantId: string | null. Parsing produces redacted errors that never echo the key value. Mutually exclusive sources.API_KEY_SUPER_ADMIN— single-string shortcut for{ scope: 'admin', tenantId: null }. Super-admin credentials are the only ones allowed to operate on/admin/tenants/*. A tenant-bound admin credential hitting those routes gets 403super_admin_only.
A super-admin credential operating on a data-access route (templates, assets, schedules, render, audit-events, schedule-dlq) must supply an X-PulpEngine-Tenant-Id header — otherwise resolveTenant() returns 400 tenant_required. The header is trimmed, lowercased, and validated against the slug regex before being passed to the store.
resolveTenant — the single legal coercion site
Stage 1 had 59 request.tenantId ?? 'default' sites across 12 files. Every one of them was a cross-tenant leak waiting to happen as soon as a super-admin credential (with tenantId: null) hit the codebase. Stage 2 introduces one helper at apps/api/src/lib/tenant-resolution.ts:
export async function resolveTenant(
request: FastifyRequest,
reply: FastifyReply,
): Promise<string | null>
The mechanical rewrite replaced every call site:
// Before
const tenantId = request.tenantId ?? 'default'
const result = await service.getByKey(tenantId, key)
// After
const tenantId = await resolveTenant(request, reply)
if (tenantId === null) return
const result = await service.getByKey(tenantId, key)
The helper handles the single-tenant default, the tenant-bound credential path, and the super-admin header path uniformly. For super-admin headers it runs assertKnown(headerValue) so a typo’d or deleted tenant fails with 403 tenant_unknown before any store call.
The grep gate now bans ?? 'default' in route/lib files (except the provider trio, tenant-resolution.ts, super-admin.ts, and a small allowlist of boundary files). Inline tenant-propagation-allow-default directives are the only remaining escape hatch, and each one requires a reviewer-visible comment.
Editor token 5-part format (B1 blocker — verify-first)
The stage 1 verifyEditorToken had switch (parts.length) { case 2|3|4: ...; default: return null }. A 5-part token was unforgeably rejected. Stage 2 adds case 5: BEFORE touching any mint-side code so partial rollouts don’t create an outage window.
New format: {iat}.{expiry}.{tenantId_b64url}.{actor_b64url}.{sig}
HMAC payload: editor:${iat}:${expiry}:${tenantId}:${actorRaw}
Backward-compat: 3-part and 4-part tokens still verify; they resolve to tenantId: 'default'. No forced re-auth on upgrade.
extractTokenActor() — the other load-bearing edit — is the pre-verification parse helper used by the named-user auth path. Stage 1 hardcoded parts.length !== 4 → without the fix, every named-user 5-part token was rejected at the first step of auth before verifyEditorToken ran. Stage 2 supports both 4-part and 5-part via parts.length - 2 indexing (the actor slot is always the second-to-last segment).
The mintEditorToken signature grew a required tenantId: string parameter. All four mint call sites were updated: auth.ts named-users, auth.ts shared-key + super-admin, oidc.ts callback, oidc.ts exchange. Every mint path runs await app.tenantStatusCache.assertKnown(mintTenantId) immediately before the mint — the universal “last gate before mint” rule.
/auth/editor-token super-admin mint flow
/auth/* routes bypass the onRequest hook, so the auth-hook assertKnown check does NOT fire for mint routes. Stage 2 adds explicit assertKnown calls at every mint path.
The super-admin mint flow on /auth/editor-token:
| Branch | Behavior |
|---|---|
| Tenant-bound credential + header present | 400 — header not accepted when credential is tenant-bound |
| Tenant-bound credential + no header | assertKnown(entry.tenantId) → mint with that tenant |
| Super-admin credential + header missing | 400 tenant_required |
| Super-admin credential + header slug-invalid | 400 malformed-slug |
| Super-admin credential + valid header + unknown tenant | 403 tenant_unknown (via global error handler) |
| Super-admin credential + valid header + active/archived tenant | mint with that tenant |
Archived tenants mint successfully. A freshly-minted session on an archived tenant succeeds on read routes (per the stage 1 soft-archive rule) and fails with 409 tenant_archived at the store-write boundary. This is deliberate — blocking mint on archive would contradict the audit/export access path.
OIDC via completion codes + default tenant
The repo uses oidc_code= query-param completion codes, not redirect fragments. Stage 2 threads tenantId through the flow concretely:
CompletionTokenDataatlib/oidc/completion-codes.tsgrows a requiredtenantId: stringfield. EverycompletionCodeStore.create({...})call site passes the resolved tenantId./oidc/completeresponse body carriestenantIdthrough to the editor./oidc/exchangeresponse body carriestenantId.OIDC_DEFAULT_TENANTenv var (default'default') — auto-provisioned OIDC users pick this up as theirtenantId. For stored users,user.tenantIdwins. Per-group tenant binding deferred to C.0b.- Startup validation: if
OIDC_ENABLED && MULTI_TENANT_ENABLED, the configured default tenant must exist and not be archived — throws at boot. - Mint-time recheck: the OIDC callback path runs
assertKnown(mintTenantId)before the mint. Failure surfaces via theerrorPagehelper as an HTML page (not JSON) because the callback returns HTML.
TenantStatusCache: archive enforcement without Prisma middleware
The earlier-draft plan used a Prisma extension for archive enforcement. That approach hit two dead-ends:
- Schedule engine + audit purge writes happen outside any HTTP request context, so a per-request cache can’t see them.
- Prisma’s legacy
$usemiddleware doesn’t fire reliably inside$transaction(...)callbacks (hazard B3).
Stage 2 drops the extension entirely. Every Postgres store write method calls await this.tenantStatusCache?.assertActive(tenantId, operation) as its first line, BEFORE any $transaction begins. The cache is a process-scoped Map<tenantId, {status, expiresAt}> with a 10-second default TTL (tunable via TENANT_STATUS_CACHE_TTL_MS). One DB query per cache miss. The shared instance is injected into both the Postgres stores and the schedule engine so non-HTTP paths hit the same cache.
Two distinct guards:
assertKnown(tenantId)— auth-boundary. ThrowsTenantUnknownError→ 403tenant_unknownon unknown tenants. Archived tenants PASS — reads on archived tenants are allowed per stage 1 soft-archive.assertActive(tenantId, operation)— store-write-boundary. ThrowsTenantUnknownError→ 403 on unknown (defensive),TenantArchivedError→ 409tenant_archivedon archived. Incrementstenant_archive_rejections_total{operation}.
Terminal / observability writes intentionally skip the guard: audit.record, dlq.insert, dlq.markAbandoned, dlq.markOrphaned, execution.updateStatus. Archived tenants still get audit rows and DLQ terminal transitions — otherwise an archive mid-execution would leave orphan rows forever.
Schedule engine dispatch-time recheck: after findDueSchedules returns (which now JOINs tenants and excludes archived ones), the engine re-checks the cache per schedule before inserting an execution row. An archive happening between the query and the per-schedule dispatch still blocks in-flight work.
Cache invalidation: POST /admin/tenants, /:id/archive, /:id/unarchive all bust the relevant cache entry so subsequent requests observe the fresh state immediately on this pod. Other pods observe the change within TTL — documented as the stale window.
Wire contract for tenant_unknown / tenant_archived
Stage 2 adds an optional code field to ErrorResponseSchema. Existing clients that parse only error + message continue to work; SDKs and tests that need precise classification read code.
// tenant_unknown — 403 Forbidden
{ "error": "Forbidden", "code": "tenant_unknown", "message": "Tenant \"future\" is not known to this server." }
// tenant_archived — 409 Conflict
{ "error": "Conflict", "code": "tenant_archived", "message": "Tenant \"acme\" is archived and rejects new writes. Reads still work." }
The auth-hook catch path at auth.plugin.ts:450-453 is rewritten to derive the error label from err.statusCode (403 → Forbidden, 401 → Unauthorized) and pass through the optional code. The global error-handler in error-handler.plugin.ts gains parallel branches for AuthError (and its TenantUnknownError subclass) and TenantArchivedError so non-auth-hook throw sites (resolveTenant inside routes, /auth/editor-token mint) produce identical wire bodies. Defense-in-depth: both paths converge.
Metrics discipline:
auth_failures_total{reason="tenant_unknown"}— new reason bucket (one additional label cardinality)tenant_archive_rejections_total{operation}— new counter with enumerated low-cardinalityoperationlabel (18 values matching the tenant-scoped store write methods).tenantIdis NOT a metric label — it’s unbounded by Prometheus standards. tenantId appears on the structured log line only.
Binary store tenant-prefixing
Non-default tenants produce tenant-prefixed filenames at upload time: postgres-asset.store.ts generates {tenantId}/{uuid}.{ext} instead of the flat {uuid}.{ext}. IAssetBinaryStore stays tenant-agnostic — it sees the opaque filename and never learns the tenant directly.
Two knock-on fixes:
FsAssetBinaryStore.save()addsfs.mkdirSync(path.dirname(target), { recursive: true })before the write. Without this, the first upload for any new tenant would throw ENOENT because the{tenantId}/subdirectory doesn’t exist yet.- Private-mode asset proxy route changes from
/assets/:filenameto/assets/*so tenant-prefixed filenames match as a two-segment wildcard. Handler readsrequest.params['*']and relaxes the path-traversal guard to allow forward slashes (but still reject..segments and leading dots). OpenAPI param name changes fromfilenameto*— downstream SDK regeneration is batched after stage 2.
The public-mode static mount stays on assetsDir root and serves tenant-prefixed filenames directly via /assets/acme/uuid.png. Filenames stay globally unique via the UUID component so cross-tenant collisions are impossible.
asset-inline.ts regex update
The stage 1 regex /src=(["'])\/assets\/([^"'?#/\\]+)\1/g explicitly excluded forward slashes — a tenant-prefixed reference like <img src="/assets/acme/uuid.png"> never matched and never got inlined. Private-mode HTML and PDF renders for non-default tenants would have broken silently. Stage 2 removes the / exclusion; the capture now accepts multi-segment filenames. Traversal protection moves to the tenant-scoped binary store wrapper’s metadata lookup.
Plugin system rejection policy
Under MULTI_TENANT_ENABLED=true, plugin identity providers and plugin storage backends are rejected at registration time:
[Plugin <name>] identity provider 'X' cannot be registered in multi-tenant mode.
The plugin-api 0.2.x contract does not carry a tenant binding...
Plugin render hooks and custom renderers load with a single-shot warning logged at activation time — they’re told to read ctx.tenantId if they need per-tenant behavior. The plugin bridge shims at plugin-system.plugin.ts:161-171 keep their hardcoded 'default' because under multi-tenant mode the plugin never activates (the rejection above fires first); under single-tenant mode they work as before.
@pulp-engine/plugin-api bumps 0.1.0 → 0.2.0 — PreRenderHookContext and PostRenderHookContext gain an optional readonly tenantId?: string field. Optional (not required) preserves source compatibility with 0.1.x plugins that construct contexts as object literals in their test suites; the RenderHookRunner fills 'default' when the field is absent.
Editor side: 7 setStoredToken callers + embed protocol + silent callback
setStoredToken(token, expiresAt, actor, displayName, scope, tenantId) gained a 6th argument. The editor’s session-storage layer adds a pulp-engine.editorTenantId key and a getStoredTenant() helper mirroring getStoredScope(). All seven call sites were updated:
| # | File | Trigger |
|---|---|---|
| 1 | apps/editor/src/components/auth/LoginGate.tsx:129 | Shared-key login form submit |
| 2 | apps/editor/src/embed-main.tsx:44 | Host posts pre-minted msg.token in embed init |
| 3 | apps/editor/src/embed-main.tsx:48 | Host posts msg.oidcToken → /oidc/exchange |
| 4 | apps/editor/src/embed-main.tsx:57 | Host posts msg.apiKey → /auth/editor-token |
| 5 | apps/editor/src/embed-main.tsx:121 | Ongoing pulp-engine:token-refresh host message |
| 6 | apps/editor/src/hooks/use-token-refresh.ts:80 | Silent iframe refresh response |
| 7 | apps/editor/src/lib/auth.ts:222 | Internal exchangeOidcCode() caller |
Two additional files were updated that don’t call setStoredToken but are part of the embed / refresh plumbing:
apps/editor/src/embed/post-message-protocol.ts—InboundMessageinit shape andTokenRefreshMessageshape both grow an optionaltenantId?: string. These are the TypeScript types host integrations compile against.apps/editor/public/oidc-silent-callback.html— static HTML outsidetscand lint. ItspostMessagepayload at lines 28-36 now forwardstenantId: data.tenantIdfrom the/oidc/completeresponse body. Without this update, silent refresh would reset the editor’s storedtenantIdto'default'on every refresh cycle, silently downgrading multi-tenant sessions.
Migration notes
Single-tenant upgrade (the common case)
Do nothing. Upgrade to v0.67.0, redeploy, and everything behaves exactly like v0.66.0. The 59-site tenant coercion rewrite preserves stage 1 semantics because resolveTenant() short-circuits to 'default' in single-tenant mode. Existing 3-part and 4-part editor tokens continue to verify — they resolve to tenantId: 'default'.
Rolling out multi-tenant mode
- Upgrade to v0.67.0 WITHOUT
MULTI_TENANT_ENABLED. Deploy and verify single-tenant still works. - Set
MULTI_TENANT_ENABLED=true+API_KEY_SUPER_ADMIN=<key>+ restart. LegacyAPI_KEY_*credentials continue to work — they all bind to'default'. - Create your first non-default tenant via
POST /admin/tenants {"id":"acme","name":"Acme Corp"}using the super-admin key. - Add
API_KEYS_JSONwith a tenant-bound entry for that tenant. Restart. - Verify isolation — a request with the acme-bound key should only see acme’s data; a request with
API_KEY_ADMINshould only see default’s data; the super-admin key must passX-PulpEngine-Tenant-Idon every data-access request.
Rollback
Flip MULTI_TENANT_ENABLED=false and restart. Behavior reverts to stage 1 immediately. No data migration, no schema rollback — stage 2 adds no new Prisma migrations. Non-default tenant rows in the tenants table stay dormant; tenant-prefixed filenames in assets/<tenantId>/... stay on disk.
Application-level rollback to v0.66.0 is also safe in pure single-tenant mode (no non-default tenants ever created). If non-default tenants have been used, the pre-v0.67.0 code will read tenant-prefixed filenames as flat filenames — the asset fetches will miss. Recommended: stay on v0.67.0 and toggle the flag.
Verification
cd apps/api && pnpm exec tsc --noEmit— zero errorscd apps/editor && pnpm exec tsc --noEmit— zero errorscd packages/plugin-api && pnpm build— zero errors, outputs 0.2.0 typesnode scripts/check-tenant-propagation.mjs— clean under strict multi-tenant-mode rulenode scripts/check-template-resolution.mjs— cleanpnpm --filter @pulp-engine/api test— 975 passing, pre-existing Windows Fastify-startup 10-15s timeouts documented inproject_flaky_tests_v1.md(all verified to pass in isolation)node scripts/check-version.mjs— 9 lockstep files aligned at 0.67.0
Follow-ups for C.0b / C.1 / C.2
| Follow-up | Notes |
|---|---|
| C.0b SQL Server multi-tenant | Requires ITenantStore implementation against raw mssql + archive guards in SqlServerTemplateStore/AssetStore/AuditStore. Postgres ships first as the reference. |
| C.0b plugin-api tenant-aware contracts | Semver-major: PluginIdentityProvider.tenantId, PluginTemplateStore.list({tenantId}), etc. Requires a codemod release for plugin authors. |
| C.0b per-group OIDC tenant binding | OIDC_TENANT_GROUP_MAPPING_JSON or equivalent. |
| C.0b pre-activation plugin rejection via manifest capabilities field | More informative than registration-time rejection. |
| C.0b startup audit loop iterating all tenants | Currently hardcoded to 'default' for the legacy-SVG scan. |
| C.0b Postgres RLS as second-layer defense | Explicit store-layer guards are the primary gate. |
C.0b UserTenantMembership table | Cross-tenant users (consultant case). Stage 2 keeps globally-unique user IDs. |
C.0b ApiCredential CRUD table | Env-var only for stage 2. |
| C.1 per-tenant schedule engine sharding | Stage 2 ships one global engine with archive filter + dispatch recheck. |
| C.1 tenant usage analytics | RenderUsage table, GET /usage route. |
| C.2 per-tenant rate limits | @fastify/rate-limit keyGenerator using `(tenantId |
| SDK regeneration | Python/.NET/Go/Java — the /assets/* wildcard and tenantId on editor-token responses both require codegen. |
| Tenant hard-delete / purge | Stage 2 returns 501; cascade policy requires design work. |
| Execution-state granular archive policy | Currently updateStatus skips the archive guard entirely so in-flight executions can reach terminal state. A future release may add per-terminal-state guards using the TenantArchiveOperation label already present in the metrics enum. |
Hazard register (closed in stage 2)
Every hazard from the planning stress-test is closed concretely:
- B1:
verifyEditorTokencase 5:added FIRST;extractTokenActorsupports both 4-part and 5-part - B2: 59
?? 'default'sites rewritten toresolveTenant+ grep gate ban - B3: Prisma extension dropped entirely; explicit
assertActiveguards run BEFORE$transaction - B4:
/assets/*wildcard + globally-unique UUID filenames preserve public-mode compatibility - B5:
FsAssetBinaryStore.save()addsmkdirSyncfor tenant subdirectories - H1: no Prisma extension, no recursion problem
- H2: per-request cache design dropped in favor of process-scoped TTL cache; stale window documented
- H3: dispatch-time recheck in schedule engine catches mid-tick archives
- H4:
tenantIdis OPTIONAL on hook contexts (semver-minor) - H5: startup validation warns, not throws, for
API_KEYS_JSONreferencing unknown tenants - H6: redacted parse errors;
API_KEYS_JSON_FILEalternative - H7:
tenantStore.*exempt from grep-gate store-call check - R1: OIDC via completion codes (no fragment parser);
CompletionTokenData.tenantIdrequired - R2:
asset-inline.tsregex updated + regression test - R3: tenant CRUD wiring traced through storage-factory → storage.plugin → server.ts explicitly
- R4: explicit store-layer guards (no Prisma middleware)
- R5: SQL Server + multi-tenant rejected at startup alongside file mode
- R6:
render.ts?? 'default'carveout removed; hook contexts use resolved variable - R7:
/auth/editor-tokensuper-admin branch documented with explicit flow - R8:
extractTokenActor()supports both 4-part and 5-part - R9:
TenantStatusCache.assertKnown()covers auth hook + resolveTenant + all 4 mint paths - R10: 7
setStoredTokencallers enumerated and updated; embed protocol types updated; silent-callback HTML updated - R11:
apps/editor/src/embed/post-message-protocol.ts+public/oidc-silent-callback.htmllisted explicitly - R12:
tenant_unknownwire contract specified (status, body shape,codefield);tenant_archivedparallel - R13: auth-hook catch path rewrite stated explicitly with sample code
- R14: universal “last gate before mint” rule covers all 4 mint paths
- R15:
tenant_archive_rejections_total{operation}counter — NOT underauth_failures_total - R16: archived tenants mint successfully (reads allowed); only unknown tenants 403 at mint
- R17: global error-handler branch covers non-auth-hook throw sites
- R18:
resolveTenantis async, every call site usesawait