Pulp Engine — Backup & Restore Runbook
Canonical procedure for operators to back up a Pulp Engine deployment and restore from backup. Pairs with deployment-guide.md (topology + config) and ha-reference-architecture.md (multi-replica deployments).
This runbook assumes a Postgres + object-store topology (the production recommendation). For file-mode evaluation deployments, see § 6.
1. What to back up
Pulp Engine has exactly two durable stores. Everything else (request handlers, editor tokens, capability caches, in-memory schedulers) is derivable from these two.
| Layer | What’s in it | Where it lives |
|---|---|---|
| Postgres | Templates, versions, labels, assets metadata, audit events, schedules + executions + DLQ, tenant registry, render usage | DATABASE_URL — schema in apps/api/src/prisma/schema.prisma |
| Asset binaries | Image files uploaded via the asset library | ASSETS_DIR (filesystem) or the S3 bucket (ASSET_BINARY_STORE=s3) |
Not durable / do not back up:
.env/ secrets — treat these as configuration managed by your platform’s secret manager- Local pod disk (rendered PDFs are transient, Chromium cache is rebuilt on start)
- In-memory state — the tenant status cache reconstructs automatically on restart. Async batch jobs are durable on postgres deployments (as of v0.72.0): the job metadata lives in the
batch_jobsPostgres table and completed result envelopes live inIJobResultBlobStore(filesystem atJOB_RESULT_BLOB_DIRor S3 atJOB_RESULT_BLOB_BUCKET). On restart, pending/processing rows older thanSTARTUP_ORPHAN_GRACE_MSare failed with codejob_abandoned_at_startup; remaining active rows rehydrate into the hot cache. Completed jobs remain pollable untilWEBHOOK_JOB_RETENTION_SECONDSelapses. Theprocessing-timeout sweep still handles crashes that happen later inprocessJob(). File-mode and SQL Server deployments retain the pre-T3 in-memory-only behaviour (jobs lost on pod restart). The delivery dispatcher DLQ is Postgres-backed and survives restart.
2. Backup procedure
Authoritative backup path is pg_dump + a sync of the asset store. The CLI (§ 4) adds a lightweight inventory/verification layer on top — it does not replace pg_dump.
2.1 Postgres
# Full, compressed, custom-format dump (restorable via pg_restore).
pg_dump \
--format=custom \
--no-owner \
--no-privileges \
--file=pulp-engine-$(date -u +%Y%m%dT%H%M%SZ).dump \
"$DATABASE_URL"
Recommended cadence:
- Production: continuous archiving (WAL-E / WAL-G / managed-Postgres PITR) + daily full dump for operator-visible artifacts.
- Evaluation / staging: daily dump is sufficient.
2.2 Asset binaries
S3 mode (ASSET_BINARY_STORE=s3):
# Enable S3 bucket versioning once (recommended — gives you PITR for blobs):
aws s3api put-bucket-versioning --bucket "$S3_BUCKET" \
--versioning-configuration Status=Enabled
# Snapshot copy into a dated backup bucket/prefix:
aws s3 sync "s3://$S3_BUCKET" "s3://$BACKUP_BUCKET/assets-$(date -u +%Y%m%dT%H%M%SZ)/"
Filesystem mode (ASSET_BINARY_STORE=filesystem):
tar -czf assets-$(date -u +%Y%m%dT%H%M%SZ).tar.gz -C "$(dirname "$ASSETS_DIR")" "$(basename "$ASSETS_DIR")"
2.3 Consistency
Pulp Engine writes the asset binary first and the metadata row second within the same request. A backup taken concurrently with active writes can capture:
- An asset binary with no metadata row (harmless — the binary becomes unreferenced; validated-publish will flag on first reference).
- A metadata row with no binary (render will fail fast with
ASSET_BINARY_MISSING— the documented fail-closed behaviour from v0.35.0).
To eliminate the inconsistency window, prefer one of:
- Postgres point-in-time recovery + S3 bucket versioning (belt-and-suspenders — recover both stores to a matching wall-clock moment).
- A brief maintenance window: stop the API pods, run both backups, restart.
For most operators continuous archiving + versioning is sufficient and the fail-closed behaviour on render makes the inconsistency safe to tolerate.
2.4 Inventory + verification (optional)
After a backup run, use the CLI to capture a manifest that can be verified later:
pulp-engine backup create --out ./backup-$(date -u +%Y%m%dT%H%M%SZ) \
--api-url http://localhost:3000 --api-key $API_KEY_ADMIN
This does not dump the database and does not copy any binaries — it captures a metadata inventory (counts + per-template/per-asset identifying fields) that a later backup verify run can diff against the live API. See § 4.
3. Restore procedure
Restore order matters: restore the object store first (so that metadata rows have backing binaries), then Postgres, then restart the API.
3.1 Stop the API
docker compose -f compose.postgres.yaml stop pulp-engine
3.2 Restore asset binaries
S3 mode:
aws s3 sync "s3://$BACKUP_BUCKET/assets-<timestamp>/" "s3://$S3_BUCKET"
Filesystem mode:
rm -rf "$ASSETS_DIR"
mkdir -p "$ASSETS_DIR"
tar -xzf assets-<timestamp>.tar.gz -C "$(dirname "$ASSETS_DIR")"
3.3 Restore Postgres
DATABASE_URL_MAINTENANCE below is the same host and credentials as
DATABASE_URL with the database path set to postgres (the built-in
maintenance database) — Postgres cannot drop the database you are connected
to. Example: postgresql://pulp-engine:****@db-host:5432/postgres.
The default database name from compose.postgres.yaml
is pulp-engine — a hyphenated identifier, which must be double-quoted
in SQL. Substitute your own name (and owner role) if you changed
POSTGRES_DB / POSTGRES_USER.
# Drop and recreate to avoid leftover rows from the current state.
# WITH (FORCE) terminates any straggler connections (Postgres 13+).
# Safer alternative in production: restore into a fresh database and flip the connection.
psql "$DATABASE_URL_MAINTENANCE" -c 'DROP DATABASE IF EXISTS "pulp-engine" WITH (FORCE);'
psql "$DATABASE_URL_MAINTENANCE" -c 'CREATE DATABASE "pulp-engine" OWNER "pulp-engine";'
pg_restore \
--dbname="$DATABASE_URL" \
--no-owner --no-privileges \
pulp-engine-<timestamp>.dump
3.4 Run migrations
Run prisma migrate deploy against the restored database to bring the schema forward to the current app version (no-op if already current):
docker compose -f compose.postgres.yaml run --rm migrate
3.5 Start the API
docker compose -f compose.postgres.yaml start pulp-engine
Schedules resume automatically: the dispatcher reads schedules.next_run_at from the restored rows.
3.6 Verify
# 1. Structural — manifest integrity check against the live API
pulp-engine backup verify --in ./backup-<timestamp> \
--api-url http://localhost:3000 --api-key $API_KEY_ADMIN
# 2. Functional — render a known-good template, byte-compare against a reference PDF
curl -X POST http://localhost:3000/render/pdf \
-H "x-api-key: $API_KEY_RENDER" \
-H "content-type: application/json" \
-d '{"template":"known-good","data":{...}}' \
-o rendered.pdf
Important: backup verify confirms the manifest inventory still matches the live API (template/asset counts and identifying metadata — it computes no checksums). It does not prove that the restore succeeded — that is proven by the sample render in step 2 above.
3.7 Restore rehearsal (continuously exercised)
The § 3.3–3.4 Postgres path is rehearsed by
scripts/restore-rehearsal.sh: it seeds a
database, captures per-table row counts, runs the exact dump → drop →
recreate → pg_restore → prisma migrate deploy sequence from this runbook,
and fails if any table’s row count differs after restore. CI runs it on every
push to main against the Postgres service container, so a schema change that
breaks restorability is caught at merge time, not during an incident. You can
run it against any disposable database with DATABASE_URL set — never
against production (it drops and recreates the database).
4. CLI tooling — pulp-engine backup
Two subcommands are shipped today. A third, pulp-engine backup restore, is a tracked follow-up and intentionally out of scope for this release — see § 5.
pulp-engine backup create
Captures a metadata inventory of the live API into a backup directory. Writes exactly one file:
manifest.json— manifest schema version, timestamp, API URL, counts, the template inventory (key,name,currentVersion), and the asset metadata inventory (id,filename,sizeBytes,mimeType)
It does not copy asset binaries and computes no checksums — binaries are backed up separately per § 2.2. Postgres is likewise not dumped by this command; operators run pg_dump separately and the path to the resulting file may be recorded in the manifest via --db-dump <path>.
pulp-engine backup create --out ./backup-20260413 \
--api-url http://localhost:3000 \
--api-key $API_KEY_ADMIN \
[--db-dump ./pulp-engine-20260413.dump]
pulp-engine backup verify
Reads manifest.json and diffs it against a live API:
- Templates compared by
key(added / removed / changedname/currentVersion). - Assets compared by
id(added / removed / changed metadata). --tolerance <n>allows up to n drifted items before the run fails.
This is a metadata drift check — it reads no binaries and computes no checksums. Exit code 0 on match (or drift within tolerance), 2 on drift beyond tolerance, 1 on a missing/unreadable manifest.
pulp-engine backup verify --in ./backup-20260413 \
--api-url http://localhost:3000 \
--api-key $API_KEY_ADMIN
5. What this release does NOT include
The following are intentionally out of scope and tracked for a future release:
pulp-engine backup restorewrite-path command — replaying templates/versions/assets from a backup directory into a live API. Requires admin import/export endpoints with authentication, validation, tenant-scoping, and conflict-resolution semantics that haven’t been designed yet. Restore today is manual (§ 3).- Automated backup scheduler inside the API — Pulp Engine does not manage its own backups; run
pg_dump+ asset sync from your platform’s standard backup tooling. - Cross-region replication — single-region residency by design. See data-residency-gdpr.md for the multi-region pattern (separate deployments per region).
6. File-mode evaluation deployments
If you are running STORAGE_MODE=file (evaluation / single-instance), back up both data directories — TEMPLATES_DIR holds templates, versions, and audit events (.audit-events.jsonl); ASSETS_DIR holds the asset metadata index (.assets-index.json) and every uploaded asset binary. A backup of TEMPLATES_DIR alone silently loses all asset data.
STAMP=$(date -u +%Y%m%dT%H%M%SZ)
tar -czf pulp-engine-templates-$STAMP.tar.gz \
-C "$(dirname "$TEMPLATES_DIR")" "$(basename "$TEMPLATES_DIR")"
tar -czf pulp-engine-assets-$STAMP.tar.gz \
-C "$(dirname "$ASSETS_DIR")" "$(basename "$ASSETS_DIR")"
(When ASSET_BINARY_STORE=s3 is configured alongside file mode, asset binaries live in the bucket instead — back those up per § 2.2; ASSETS_DIR still holds the metadata index.)
Restore: stop the API, extract both tarballs over their directories, start the API. File mode is not HA-safe and is not recommended for production.