Pulp Engine — Tenant Isolation Guarantees

Documents what Pulp Engine guarantees (and does not guarantee) about isolation between tenants in multi-tenant mode. Describes the C.0 implementation: the tenant primitive (stage 1) and multi-tenant mode being lit up via MULTI_TENANT_ENABLED=true (stage 2).

1. Scope

This document describes tenant isolation as shipped in v0.67.0+. Isolation is only enforced when MULTI_TENANT_ENABLED=true. In single-tenant mode (the default), every row is stamped with tenantId = 'default' and the enforcement paths below collapse to a no-op — there is still only one tenant, but the isolation contract is not in scope.

2. What “tenant isolation” means here

Tenant isolation is a row-level data boundary: data written under tenant A cannot be read or mutated through requests authenticated as tenant B. It is not:

A compute boundary (all tenants share the same API pods, the same Node.js process, and the same Chromium render process).
A network boundary (all tenants reach the same ingress and emit from the same egress).
A resource-quota boundary (noisy-neighbor starvation is possible across tenants; per-tenant rate limits are out of scope).

Operators needing compute or network isolation per tenant should run separate deployments per tenant (one database, one S3 bucket, one set of API pods). Pulp Engine’s multi-tenant mode is designed for cost-efficient SaaS multi-tenancy, not for hostile-tenant sandboxing.

3. Enforcement points

Every durable row is stamped with tenantId and every access path flows through a central tenant resolver before touching a store.

Layer	Mechanism	File
Request → tenantId	`resolveTenant(request, reply)` — single function used by every route. Reads the authenticated credential (API key or editor token) and emits the caller’s tenantId (or 403 for cross-tenant attempts).	apps/api/src/lib/tenant-resolution.ts
Store methods	Every data-access method on every store port takes `tenantId` as its first argument. No overloads, no defaults.	apps/api/src/storage/types.ts
Query builders	Every Postgres and SQL Server query includes `tenant_id = @tenantId` in its `WHERE` clause. Every Prisma call includes `tenantId` in its `where`.	apps/api/src/storage/postgres/, apps/api/src/storage/sqlserver/
Asset binaries	Object-store keys are prefixed by `tenantId/` in S3 mode. Filesystem mode is single-tenant only (rejected under `MULTI_TENANT_ENABLED=true`).	apps/api/src/storage/storage-factory.ts
Editor tokens	5-part token `iat.expiry.tenantId.actor.sig` — the tenantId is in the signed payload, so a token minted for tenant A cannot be used against tenant B.	apps/api/src/lib/editor-token.ts
Archive enforcement	`TenantStatusCache.assertActive(tenantId)` runs on every write path. Archived tenants fail-closed (409).	apps/api/src/lib/tenant-status-cache.ts
Plugin rejection	Plugin-provided storage and identity providers are rejected in multi-tenant mode (they have not been audited for tenant-awareness).	apps/api/src/plugins/plugin-system.plugin.ts
CI grep gate	scripts/check-tenant-propagation.mjs — strict error on literal `'default'` as a tenantId outside a tiny allowlist, and on `?? 'default'` coercion shortcuts in route/lib.

4. The super-admin escape hatch

A credential marked superAdmin: true in API_KEYS_JSON may target any tenant via the X-PulpEngine-Tenant-Id request header. The tenant resolver:

Requires the header to be present (no implicit tenant for super-admins).
Validates the slug format and that the tenant exists and is active.
Emits the requested tenantId if valid; 400/404 otherwise.

Super-admin credentials are operator infrastructure, not application identities. Issue them narrowly (operations, migrations, cross-tenant reporting) and audit their use — every super-admin request produces an audit event with the tenant header recorded.

5. Known gaps and caveats

These are documented honestly rather than quietly:

Shared compute. A bug in the render pipeline that leaks state across requests (e.g. a global variable in a Handlebars helper) would cross tenants. Mitigation: render isolation modes (RENDER_MODE=container or socket) put each render in a fresh process. Consider these for stronger per-render isolation.
Shared Chromium (child-process render). Chromium runs as a singleton within the API pod. No authoritative guarantee is made that two concurrent renders cannot observe each other’s DOM via a browser-engine bug. This is a theoretical, not observed, risk; container/socket render mode eliminates it.
TenantStatusCache staleness window. Archive-a-tenant propagation is eventually consistent within TENANT_STATUS_CACHE_TTL_MS (default 10 s). During this window, a replica may still serve writes to a just-archived tenant. Tune downward if needed.
Audit-store purge schedulers are tenant-scoped. DELETE FROM audit_events WHERE tenant_id = $1 AND timestamp < $2. No tenant can purge another tenant’s trail.
Cross-tenant reporting is out of scope. Pulp Engine has no API for “list all audit events across all tenants” without a super-admin credential. This is deliberate — such an API is a privileged-data pane and is the operator’s responsibility to build carefully (direct DB access works).
Plugins are rejected in multi-tenant mode. Until plugin-tenant-awareness is audited end-to-end, the plugin system refuses to activate storage or identity plugins when MULTI_TENANT_ENABLED=true. Renderer and event plugins are safe because they do not touch tenant-scoped state.

6. Test harness

Three complementary tools verify the contract:

Static CI gate — scripts/check-tenant-propagation.mjs runs in CI and errors on any 'default' literal or ?? 'default' coercion outside the allowlist. This catches “I forgot to thread the tenantId through” at review time.
Integration tests — tests under apps/api/src/tests/ that hit the full Fastify pipeline exercise the enforced paths end-to-end (templates, assets, audit, schedules, rate limits). Example: tenant-rate-limits.test.ts exercises per-tenant rate-limit isolation.
Cross-tenant leak red-line harness — cross-tenant-leak.test.ts (Postgres, auto-skip without DATABASE_URL) seeds data under tenant A and then, resolving every request as tenant B, asserts:
- every read returns empty / 404 for A’s resource ids,
- every write targeting A’s resource ids returns 403 / 404,
- A’s state is unchanged after B’s attempts (no mutation side effects).
Covered surfaces: templates, assets, audit events, schedules, editor-token tenant-claim enforcement (a tenant-A token cannot be redirected to tenant B via a forged X-PulpEngine-Tenant-Id header), and render-usage rollup isolation. The most load-bearing HTTP-layer assertions are mirrored in sqlserver-multi-tenant.test.ts so a backend-specific regression cannot slip past by only running the Postgres suite.

7. Operator checklist for enabling multi-tenant mode

Use a database-backed storage mode (postgres or sqlserver). File mode does not support multi-tenant and is rejected at startup.
Use S3 for asset binaries (ASSET_BINARY_STORE=s3). Filesystem mode is rejected.
Provision API_KEYS_JSON with per-tenant credentials; only issue superAdmin: true keys to a small, audited set of operators.
Configure OIDC with OIDC_DEFAULT_TENANT per IdP if you auto-provision users.
Set TENANT_STATUS_CACHE_TTL_MS to match your archive-propagation SLA.
Do not enable plugin-provided storage or identity plugins — they are rejected at activation.
Run the static gate locally before shipping any patch that touches stores or routes:
```
node scripts/check-tenant-propagation.mjs
```

← Back to docs index