Pulp Engine — Data Residency & GDPR Deployment Guidance

This document is for operators deploying Pulp Engine into environments with data-residency or GDPR obligations. It inventories the personal data Pulp Engine stores, explains how residency is achieved (single-region deployment), and gives a playbook for DSAR / erasure / portability requests.

Pulp Engine is operator-managed: the operator controls the infrastructure, the database, and the asset store. Pulp Engine itself introduces no third-party data flows unless AI template generation is enabled (§ 6).

1. Personal data inventory

All personal data stored by Pulp Engine lives in the operator’s Postgres + object store. There is no PulpEngine-operated cloud component.

Data	Storage	Retention control
Actor identity — named editor users (`id`, optional `displayName`, optional `email`)	`EDITOR_USERS_JSON` env var, plus Postgres `EditorUser` row on first OIDC login. See apps/api/src/config.ts.	Operator-managed: remove from env + delete row.
OIDC subject claim (`sub`) and `lastOidcLogin` timestamp	Postgres `EditorUser.oidcSub` + `EditorUser.lastOidcLogin`.	Operator-managed: deleted when the user row is deleted.
Editor session tokens (HMAC-signed, stateless)	Not stored. 5-part token `iat.expiry.tenantId.actor.sig` is fully self-contained and verified cryptographically. See apps/api/src/lib/editor-token.ts.	TTL via `EDITOR_TOKEN_TTL_MINUTES` (default 480, range 5–1440). Revoke all tokens cluster-wide by setting `EDITOR_TOKEN_ISSUED_AFTER` to a future timestamp.
Audit events — `actor`, `resourceType`, `resourceId`, `timestamp`, `details`	Postgres `AuditEvent` table.	`AUDIT_RETENTION_DAYS` (default 90) + background scheduler. Ad-hoc purge via `DELETE /audit-events?before=<ISO>` or `DELETE /audit-events?actor=<id>`.
Render usage — `actor`, `tenantId`, `timestamp`, `template`, `renderMode`	Postgres `RenderUsage` table.	`RENDER_USAGE_RETENTION_DAYS` + background scheduler.
Template content	Postgres or filesystem (depending on `STORAGE_MODE`). Templates themselves are not personal data, but user-supplied `data` passed into `/render` may contain personal data. Render inputs are not persisted — they flow through the render pipeline and are discarded.	N/A (not stored).
Rendered PDFs / DOCX / XLSX / PPTX	Returned in the HTTP response; not persisted server-side. If your integration stores them afterwards, that is out of Pulp Engine’s scope.	Downstream system’s responsibility.

IP addresses

IP addresses are intentionally not logged in audit events. The AuditEvent schema (apps/api/src/prisma/schema.prisma) has no IP column. This is a deliberate data-minimization posture, not a gap. Operators who need IP logging for security or compliance reasons should capture it at the reverse-proxy or LB access-log layer — those logs are outside Pulp Engine’s data model and under the operator’s retention policy.

2. Data residency

Residency = storage location. Pulp Engine’s durable data lives in two places:

The operator’s Postgres instance (DATABASE_URL).
The operator’s object store (S3_BUCKET) or filesystem (ASSETS_DIR).

Both are operator-provisioned. A single-region deployment = single-region data residency. There is no automatic cross-region replication, no PulpEngine-hosted state, and no telemetry that exfiltrates data to third parties.

Multi-region deployments

Pulp Engine does not natively replicate across regions. For multi-region obligations, run separate deployments per region:

EU deployment          US deployment          APAC deployment
├─ EU Postgres         ├─ US Postgres         ├─ APAC Postgres
├─ EU S3 bucket        ├─ US S3 bucket        ├─ APAC S3 bucket
└─ EU API pods         └─ US API pods         └─ APAC API pods

Tenants are routed to a region at provisioning time. In multi-tenant mode (MULTI_TENANT_ENABLED=true), tenant-to-region mapping is operator-controlled via API_KEYS_JSON and the OIDC OIDC_DEFAULT_TENANT configuration — each region has its own tenant registry.

Cross-region replication (read replicas in another region, fail-over) is possible at the database layer but not a Pulp Engine feature; it is the operator’s responsibility to ensure any such replication respects the same residency obligations as the primary region.

The procedures below assume a named-user deployment (EDITOR_USERS_JSON). Shared API keys (API_KEY_ADMIN, API_KEY_EDITOR) have no per-person attribution and do not fall under individual DSAR scope.

3.1 Right of access (Article 15 — DSAR)

Identify all data attributable to the subject’s actor identifier.

# 1. Audit trail
curl -H "x-api-key: $API_KEY_ADMIN" \
  "https://api.example.com/audit-events?actor=$ACTOR&limit=1000"
# Paginate via offset until total is exhausted.

# 2. EditorUser row (requires direct DB access — no API)
psql "$DATABASE_URL" -c \
  "SELECT id, display_name, email, oidc_sub, last_oidc_login FROM editor_users WHERE id = '$ACTOR';"

# 3. Templates/versions authored by the subject (createdBy field)
psql "$DATABASE_URL" -c \
  "SELECT key, version, created_at, created_by FROM template_versions WHERE created_by = '$ACTOR';"

# 4. Assets uploaded by the subject (createdBy field)
psql "$DATABASE_URL" -c \
  "SELECT id, filename, created_at, created_by FROM assets WHERE created_by = '$ACTOR';"

Collate the output and deliver to the subject within your statutory deadline (30 days under GDPR).

3.2 Right to erasure (Article 17)

Revoke active sessions. Set EDITOR_TOKEN_ISSUED_AFTER=<now> on all API pods. This invalidates every existing editor token cluster-wide on the next request.
Remove the identity. Delete the subject from EDITOR_USERS_JSON and remove the Postgres row:
```
psql "$DATABASE_URL" -c "DELETE FROM editor_users WHERE id = '$ACTOR';"
```

Erase the audit trail for that actor.

curl -X DELETE -H "x-api-key: $API_KEY_ADMIN" \
  "https://api.example.com/audit-events?actor=$ACTOR"
# Returns: { "deleted": <count> }

(New in this release — see apps/api/src/routes/audit-events/audit-events.routes.ts.)

Preserve or anonymize template/asset createdBy — operator decision. GDPR allows retaining the createdBy attribution on business records where there is a legitimate interest in version provenance; if erasure is required, null out the column via a direct DB update:
```
psql "$DATABASE_URL" -c \
  "UPDATE template_versions SET created_by = NULL WHERE created_by = '$ACTOR';
   UPDATE assets            SET created_by = NULL WHERE created_by = '$ACTOR';"
```
IdP deprovisioning. If using OIDC, remove the subject from the IdP as well — Pulp Engine does not auto-deprovision when the IdP does; it only auto-provisions on login.
Record the erasure. Your compliance process should document the request, the fields erased, and the date.

3.3 Right to portability (Article 20)

Export via the backup CLI, optionally scoped to the subject’s tenant:

pulp-engine backup create --out ./subject-export \
  --api-url https://api.example.com --api-key $API_KEY_ADMIN \
  --notes "GDPR portability export for $ACTOR — ticket $TICKET"

The manifest contains the subject’s attributable records (templates + assets with matching createdBy). Audit-trail export is separate: dump via GET /audit-events?actor=$ACTOR (paginated).

3.4 Right to rectification (Article 16)

Update the EDITOR_USERS_JSON entry (displayName / email) and restart the API pods, or update the Postgres row directly. Historical audit events are not rewritten — they record what was true at the time of action; that is the intended behaviour of an audit trail.

4. Retention controls summary

Control	Default	Range	Effect
`AUDIT_RETENTION_DAYS`	90	any positive integer	Background scheduler purges audit events older than this.
`AUDIT_PURGE_INTERVAL_HOURS`	24	1+	How often the scheduler runs.
`RENDER_USAGE_RETENTION_DAYS`	90	any positive integer	Background scheduler purges render-usage rows.
`EDITOR_TOKEN_TTL_MINUTES`	480	5–1440	How long a minted editor token is accepted.
`EDITOR_TOKEN_ISSUED_AFTER`	unset	ISO 8601	Hard revoke: all tokens issued before this timestamp are rejected.
Object-store lifecycle rules	operator	any	Unreferenced asset binaries — configure bucket lifecycle if needed.

Set these consistent with your compliance posture. AUDIT_RETENTION_DAYS=0 disables the purge scheduler (not recommended).

5. Tenant isolation

When MULTI_TENANT_ENABLED=true, every durable row is stamped with tenantId and every read/write enforces it. See tenant-isolation-guarantees.md for the contract and current gaps.

6. Subprocessors

The only subprocessors are those the operator chooses:

The Postgres provider (self-hosted, AWS RDS, GCP Cloud SQL, Azure Database, etc.).
The object-store provider (self-hosted MinIO, AWS S3, R2, GCS, etc.).
The IdP (Okta, Auth0, Entra, Keycloak, etc.) if OIDC is enabled.
Anthropic — only if AI template generation is enabled (POST /templates/generate). In that mode, the operator’s template-generation prompts are sent to the Anthropic API (claude-opus-4-7 or similar). If your data-residency contract does not allow sending prompts outside your region, leave AI template generation disabled (it is an opt-in feature).

Pulp Engine itself operates no hosted service and introduces no additional subprocessors.

7. Known caveats

Named-user mode is required for per-person GDPR workflows. If you use only shared API_KEY_* credentials, audit events have no actor attribution and individual DSAR / erasure cannot be scoped. Deploy with EDITOR_USERS_JSON (and OIDC, ideally) for compliance-grade attribution.
Backups contain personal data. Retention and residency obligations apply to backup artifacts as well. Store backups in the same region, apply the same retention policy, and erase backed-up copies when an erasure request is completed (or document a reasonable delay window, typically ≤ 30 days).
Audit-by-actor erasure extends to the store layer only. If you have WAL archives or DB snapshots, those still contain the erased rows until the WAL retention window expires.
No automatic IdP deprovisioning hook. Removing a user from the IdP does not remove them from Pulp Engine; the operator must delete the EditorUser row manually.

← Back to docs index