Pulp Engine Document Rendering
Get started

Pulp Engine — Backup & Restore Runbook

Canonical procedure for operators to back up a Pulp Engine deployment and restore from backup. Pairs with deployment-guide.md (topology + config) and ha-reference-architecture.md (multi-replica deployments).

This runbook assumes a Postgres + object-store topology (the production recommendation). For file-mode evaluation deployments, see § 6.


1. What to back up

Pulp Engine has exactly two durable stores. Everything else (request handlers, editor tokens, capability caches, in-memory schedulers) is derivable from these two.

LayerWhat’s in itWhere it lives
PostgresTemplates, versions, labels, assets metadata, audit events, schedules + executions + DLQ, tenant registry, render usageDATABASE_URL — schema in apps/api/src/prisma/schema.prisma
Asset binariesImage files uploaded via the asset libraryASSETS_DIR (filesystem) or the S3 bucket (ASSET_BINARY_STORE=s3)

Not durable / do not back up:

  • .env / secrets — treat these as configuration managed by your platform’s secret manager
  • Local pod disk (rendered PDFs are transient, Chromium cache is rebuilt on start)
  • In-memory state — the tenant status cache reconstructs automatically on restart. Async batch jobs are durable on postgres deployments (as of v0.72.0): the job metadata lives in the batch_jobs Postgres table and completed result envelopes live in IJobResultBlobStore (filesystem at JOB_RESULT_BLOB_DIR or S3 at JOB_RESULT_BLOB_BUCKET). On restart, pending/processing rows older than STARTUP_ORPHAN_GRACE_MS are failed with code job_abandoned_at_startup; remaining active rows rehydrate into the hot cache. Completed jobs remain pollable until WEBHOOK_JOB_RETENTION_SECONDS elapses. The processing-timeout sweep still handles crashes that happen later in processJob(). File-mode and SQL Server deployments retain the pre-T3 in-memory-only behaviour (jobs lost on pod restart). The delivery dispatcher DLQ is Postgres-backed and survives restart.

2. Backup procedure

Authoritative backup path is pg_dump + a sync of the asset store. The CLI (§ 4) adds a lightweight inventory/verification layer on top — it does not replace pg_dump.

2.1 Postgres

# Full, compressed, custom-format dump (restorable via pg_restore).
pg_dump \
  --format=custom \
  --no-owner \
  --no-privileges \
  --file=pulp-engine-$(date -u +%Y%m%dT%H%M%SZ).dump \
  "$DATABASE_URL"

Recommended cadence:

  • Production: continuous archiving (WAL-E / WAL-G / managed-Postgres PITR) + daily full dump for operator-visible artifacts.
  • Evaluation / staging: daily dump is sufficient.

2.2 Asset binaries

S3 mode (ASSET_BINARY_STORE=s3):

# Enable S3 bucket versioning once (recommended — gives you PITR for blobs):
aws s3api put-bucket-versioning --bucket "$S3_BUCKET" \
  --versioning-configuration Status=Enabled

# Snapshot copy into a dated backup bucket/prefix:
aws s3 sync "s3://$S3_BUCKET" "s3://$BACKUP_BUCKET/assets-$(date -u +%Y%m%dT%H%M%SZ)/"

Filesystem mode (ASSET_BINARY_STORE=filesystem):

tar -czf assets-$(date -u +%Y%m%dT%H%M%SZ).tar.gz -C "$(dirname "$ASSETS_DIR")" "$(basename "$ASSETS_DIR")"

2.3 Consistency

Pulp Engine writes the asset binary first and the metadata row second within the same request. A backup taken concurrently with active writes can capture:

  • An asset binary with no metadata row (harmless — the binary becomes unreferenced; validated-publish will flag on first reference).
  • A metadata row with no binary (render will fail fast with ASSET_BINARY_MISSING — the documented fail-closed behaviour from v0.35.0).

To eliminate the inconsistency window, prefer one of:

  • Postgres point-in-time recovery + S3 bucket versioning (belt-and-suspenders — recover both stores to a matching wall-clock moment).
  • A brief maintenance window: stop the API pods, run both backups, restart.

For most operators continuous archiving + versioning is sufficient and the fail-closed behaviour on render makes the inconsistency safe to tolerate.

2.4 Inventory + verification (optional)

After a backup run, use the CLI to capture a manifest that can be verified later:

pulp-engine backup create --out ./backup-$(date -u +%Y%m%dT%H%M%SZ) \
  --api-url http://localhost:3000 --api-key $API_KEY_ADMIN

This does not dump the database — it captures counts, checksums, and metadata that a later backup verify run can match against. See § 4.


3. Restore procedure

Restore order matters: restore the object store first (so that metadata rows have backing binaries), then Postgres, then restart the API.

3.1 Stop the API

docker compose -f compose.postgres.yaml stop pulp-engine

3.2 Restore asset binaries

S3 mode:

aws s3 sync "s3://$BACKUP_BUCKET/assets-<timestamp>/" "s3://$S3_BUCKET"

Filesystem mode:

rm -rf "$ASSETS_DIR"
mkdir -p "$ASSETS_DIR"
tar -xzf assets-<timestamp>.tar.gz -C "$(dirname "$ASSETS_DIR")"

3.3 Restore Postgres

# Drop and recreate to avoid leftover rows from the current state.
# Safer alternative in production: restore into a fresh database and flip the connection.
psql "$DATABASE_URL_ADMIN" -c 'DROP DATABASE pulp-engine;'
psql "$DATABASE_URL_ADMIN" -c 'CREATE DATABASE pulp-engine OWNER pulp-engine;'

pg_restore \
  --dbname="$DATABASE_URL" \
  --no-owner --no-privileges \
  --clean --if-exists \
  pulp-engine-<timestamp>.dump

3.4 Run migrations

Run prisma migrate deploy against the restored database to bring the schema forward to the current app version (no-op if already current):

docker compose -f compose.postgres.yaml run --rm migrate

3.5 Start the API

docker compose -f compose.postgres.yaml start pulp-engine

Schedules resume automatically: the dispatcher reads schedules.next_run_at from the restored rows.

3.6 Verify

# 1. Structural — manifest integrity check against the live API
pulp-engine backup verify --in ./backup-<timestamp> \
  --api-url http://localhost:3000 --api-key $API_KEY_ADMIN

# 2. Functional — render a known-good template, byte-compare against a reference PDF
curl -X POST http://localhost:3000/render \
  -H "x-api-key: $API_KEY_RENDER" \
  -H "content-type: application/json" \
  -d '{"templateKey":"known-good","data":{...}}' \
  -o rendered.pdf

Important: backup verify confirms the backup artifact is internally consistent (manifest counts + checksums still match the live API). It does not prove that the restore succeeded — that is proven by the sample render in step 2 above.


4. CLI tooling — pulp-engine backup

Two subcommands are shipped today. A third, pulp-engine backup restore, is a tracked follow-up and intentionally out of scope for this release — see § 5.

pulp-engine backup create

Snapshots inventory and asset binaries into a backup directory. Writes:

  • manifest.json — schema version, API version, counts, checksums, timestamp
  • assets.tar.gz — asset binaries (filesystem mode) or an S3-inventory file (S3 mode)

Postgres is not dumped by this command. Operators run pg_dump separately and the path to the resulting file may be recorded in the manifest via --db-dump <path>.

pulp-engine backup create --out ./backup-20260413 \
  --api-url http://localhost:3000 \
  --api-key $API_KEY_ADMIN \
  [--db-dump ./pulp-engine-20260413.dump]

pulp-engine backup verify

Reads manifest.json and checks it against a live API:

  • Template/asset counts still match (within an optional --tolerance window).
  • Checksums of the backed-up asset binaries still match the current asset store.

Exit code 0 on match, non-zero on drift. Use as the last step of a backup run to confirm the artifact is internally consistent.

pulp-engine backup verify --in ./backup-20260413 \
  --api-url http://localhost:3000 \
  --api-key $API_KEY_ADMIN

5. What this release does NOT include

The following are intentionally out of scope and tracked for a future release:

  • pulp-engine backup restore write-path command — replaying templates/versions/assets from a backup directory into a live API. Requires admin import/export endpoints with authentication, validation, tenant-scoping, and conflict-resolution semantics that haven’t been designed yet. Restore today is manual (§ 3).
  • Automated backup scheduler inside the API — Pulp Engine does not manage its own backups; run pg_dump + asset sync from your platform’s standard backup tooling.
  • Cross-region replication — single-region residency by design. See data-residency-gdpr.md for the multi-region pattern (separate deployments per region).

6. File-mode evaluation deployments

If you are running STORAGE_MODE=file (evaluation / single-instance), the backup is even simpler:

tar -czf pulp-engine-file-backup-$(date -u +%Y%m%dT%H%M%SZ).tar.gz \
  -C "$(dirname "$TEMPLATES_DIR")" "$(basename "$TEMPLATES_DIR")"

TEMPLATES_DIR contains templates, versions, audit events (.audit-events.jsonl), and asset metadata. Restore: stop API, extract tarball over the directory, start API. File mode is not HA-safe and is not recommended for production.