Pulp Engine — Data Residency & GDPR Deployment Guidance
This document is for operators deploying Pulp Engine into environments with data-residency or GDPR obligations. It inventories the personal data Pulp Engine stores, explains how residency is achieved (single-region deployment), and gives a playbook for DSAR / erasure / portability requests.
Pulp Engine is operator-managed: the operator controls the infrastructure, the database, and the asset store. Pulp Engine itself introduces no third-party data flows unless AI template generation is enabled (§ 6).
1. Personal data inventory
All personal data stored by Pulp Engine lives in the operator’s Postgres + object store. There is no PulpEngine-operated cloud component.
| Data | Storage | Retention control |
|---|---|---|
Actor identity — named editor users (id, optional displayName, optional email) | EDITOR_USERS_JSON env var, plus Postgres EditorUser row on first OIDC login. See apps/api/src/config.ts. | Operator-managed: remove from env + delete row. |
OIDC subject claim (sub) and lastOidcLogin timestamp | Postgres EditorUser.oidcSub + EditorUser.lastOidcLogin. | Operator-managed: deleted when the user row is deleted. |
| Editor session tokens (HMAC-signed, stateless) | Not stored. 5-part token iat.expiry.tenantId.actor.sig is fully self-contained and verified cryptographically. See apps/api/src/lib/editor-token.ts. | TTL via EDITOR_TOKEN_TTL_MINUTES (default 480, range 5–1440). Revoke all tokens cluster-wide by setting EDITOR_TOKEN_ISSUED_AFTER to a future timestamp. |
Audit events — actor, resourceType, resourceId, timestamp, details | Postgres AuditEvent table. | AUDIT_RETENTION_DAYS (default 90) + background scheduler. Ad-hoc purge via DELETE /audit-events?before=<ISO> or DELETE /audit-events?actor=<id>. |
Render usage — actor, tenantId, timestamp, template, renderMode | Postgres RenderUsage table. | RENDER_USAGE_RETENTION_DAYS + background scheduler. |
| Template content | Postgres or filesystem (depending on STORAGE_MODE). Templates themselves are not personal data, but user-supplied data passed into /render may contain personal data. Render inputs are not persisted — they flow through the render pipeline and are discarded. | N/A (not stored). |
| Rendered PDFs / DOCX / XLSX / PPTX | Returned in the HTTP response; not persisted server-side. If your integration stores them afterwards, that is out of Pulp Engine’s scope. | Downstream system’s responsibility. |
IP addresses
IP addresses are intentionally not logged in audit events. The AuditEvent schema (apps/api/src/prisma/schema.prisma) has no IP column. This is a deliberate data-minimization posture, not a gap. Operators who need IP logging for security or compliance reasons should capture it at the reverse-proxy or LB access-log layer — those logs are outside Pulp Engine’s data model and under the operator’s retention policy.
2. Data residency
Residency = storage location. Pulp Engine’s durable data lives in two places:
- The operator’s Postgres instance (
DATABASE_URL). - The operator’s object store (
S3_BUCKET) or filesystem (ASSETS_DIR).
Both are operator-provisioned. A single-region deployment = single-region data residency. There is no automatic cross-region replication, no PulpEngine-hosted state, and no telemetry that exfiltrates data to third parties.
Multi-region deployments
Pulp Engine does not natively replicate across regions. For multi-region obligations, run separate deployments per region:
EU deployment US deployment APAC deployment
├─ EU Postgres ├─ US Postgres ├─ APAC Postgres
├─ EU S3 bucket ├─ US S3 bucket ├─ APAC S3 bucket
└─ EU API pods └─ US API pods └─ APAC API pods
Tenants are routed to a region at provisioning time. In multi-tenant mode (MULTI_TENANT_ENABLED=true), tenant-to-region mapping is operator-controlled via API_KEYS_JSON and the OIDC OIDC_DEFAULT_TENANT configuration — each region has its own tenant registry.
Cross-region replication (read replicas in another region, fail-over) is possible at the database layer but not a Pulp Engine feature; it is the operator’s responsibility to ensure any such replication respects the same residency obligations as the primary region.
3. GDPR request playbook
The procedures below assume a named-user deployment (EDITOR_USERS_JSON). Shared API keys (API_KEY_ADMIN, API_KEY_EDITOR) have no per-person attribution and do not fall under individual DSAR scope.
3.1 Right of access (Article 15 — DSAR)
Identify all data attributable to the subject’s actor identifier.
# 1. Audit trail
curl -H "x-api-key: $API_KEY_ADMIN" \
"https://api.example.com/audit-events?actor=$ACTOR&limit=1000"
# Paginate via offset until total is exhausted.
# 2. EditorUser row (requires direct DB access — no API)
psql "$DATABASE_URL" -c \
"SELECT id, display_name, email, oidc_sub, last_oidc_login FROM editor_users WHERE id = '$ACTOR';"
# 3. Templates/versions authored by the subject (createdBy field)
psql "$DATABASE_URL" -c \
"SELECT key, version, created_at, created_by FROM template_versions WHERE created_by = '$ACTOR';"
# 4. Assets uploaded by the subject (createdBy field)
psql "$DATABASE_URL" -c \
"SELECT id, filename, created_at, created_by FROM assets WHERE created_by = '$ACTOR';"
Collate the output and deliver to the subject within your statutory deadline (30 days under GDPR).
3.2 Right to erasure (Article 17)
- Revoke active sessions. Set
EDITOR_TOKEN_ISSUED_AFTER=<now>on all API pods. This invalidates every existing editor token cluster-wide on the next request. - Remove the identity. Delete the subject from
EDITOR_USERS_JSONand remove the Postgres row:psql "$DATABASE_URL" -c "DELETE FROM editor_users WHERE id = '$ACTOR';" - Erase the audit trail for that actor.
(New in this release — see apps/api/src/routes/audit-events/audit-events.routes.ts.)curl -X DELETE -H "x-api-key: $API_KEY_ADMIN" \ "https://api.example.com/audit-events?actor=$ACTOR" # Returns: { "deleted": <count> } - Preserve or anonymize template/asset
createdBy— operator decision. GDPR allows retaining thecreatedByattribution on business records where there is a legitimate interest in version provenance; if erasure is required, null out the column via a direct DB update:psql "$DATABASE_URL" -c \ "UPDATE template_versions SET created_by = NULL WHERE created_by = '$ACTOR'; UPDATE assets SET created_by = NULL WHERE created_by = '$ACTOR';" - IdP deprovisioning. If using OIDC, remove the subject from the IdP as well — Pulp Engine does not auto-deprovision when the IdP does; it only auto-provisions on login.
- Record the erasure. Your compliance process should document the request, the fields erased, and the date.
3.3 Right to portability (Article 20)
Export via the backup CLI, optionally scoped to the subject’s tenant:
pulp-engine backup create --out ./subject-export \
--api-url https://api.example.com --api-key $API_KEY_ADMIN \
--notes "GDPR portability export for $ACTOR — ticket $TICKET"
The manifest contains the subject’s attributable records (templates + assets with matching createdBy). Audit-trail export is separate: dump via GET /audit-events?actor=$ACTOR (paginated).
3.4 Right to rectification (Article 16)
Update the EDITOR_USERS_JSON entry (displayName / email) and restart the API pods, or update the Postgres row directly. Historical audit events are not rewritten — they record what was true at the time of action; that is the intended behaviour of an audit trail.
4. Retention controls summary
| Control | Default | Range | Effect |
|---|---|---|---|
AUDIT_RETENTION_DAYS | 90 | any positive integer | Background scheduler purges audit events older than this. |
AUDIT_PURGE_INTERVAL_HOURS | 24 | 1+ | How often the scheduler runs. |
RENDER_USAGE_RETENTION_DAYS | 90 | any positive integer | Background scheduler purges render-usage rows. |
EDITOR_TOKEN_TTL_MINUTES | 480 | 5–1440 | How long a minted editor token is accepted. |
EDITOR_TOKEN_ISSUED_AFTER | unset | ISO 8601 | Hard revoke: all tokens issued before this timestamp are rejected. |
| Object-store lifecycle rules | operator | any | Unreferenced asset binaries — configure bucket lifecycle if needed. |
Set these consistent with your compliance posture. AUDIT_RETENTION_DAYS=0 disables the purge scheduler (not recommended).
5. Tenant isolation
When MULTI_TENANT_ENABLED=true, every durable row is stamped with tenantId and every read/write enforces it. See tenant-isolation-guarantees.md for the contract and current gaps.
6. Subprocessors
The only subprocessors are those the operator chooses:
- The Postgres provider (self-hosted, AWS RDS, GCP Cloud SQL, Azure Database, etc.).
- The object-store provider (self-hosted MinIO, AWS S3, R2, GCS, etc.).
- The IdP (Okta, Auth0, Entra, Keycloak, etc.) if OIDC is enabled.
- Anthropic — only if AI template generation is enabled (
POST /templates/generate). In that mode, the operator’s template-generation prompts are sent to the Anthropic API (claude-opus-4-7or similar). If your data-residency contract does not allow sending prompts outside your region, leave AI template generation disabled (it is an opt-in feature).
Pulp Engine itself operates no hosted service and introduces no additional subprocessors.
7. Known caveats
- Named-user mode is required for per-person GDPR workflows. If you use only shared
API_KEY_*credentials, audit events have noactorattribution and individual DSAR / erasure cannot be scoped. Deploy withEDITOR_USERS_JSON(and OIDC, ideally) for compliance-grade attribution. - Backups contain personal data. Retention and residency obligations apply to backup artifacts as well. Store backups in the same region, apply the same retention policy, and erase backed-up copies when an erasure request is completed (or document a reasonable delay window, typically ≤ 30 days).
- Audit-by-actor erasure extends to the store layer only. If you have WAL archives or DB snapshots, those still contain the erased rows until the WAL retention window expires.
- No automatic IdP deprovisioning hook. Removing a user from the IdP does not remove them from Pulp Engine; the operator must delete the
EditorUserrow manually.