Pulp Engine Document Rendering
Get started

Pulp Engine — Technical Specification

Engineering reference for the template model and rendering pipeline as implemented. Documents behaviour, not design intent.

File retains its mvp-technical-spec.md filename for link stability; the content is the current technical reference and is no longer scoped to the original MVP.


1. Template Structure

A template is a JSON file conforming to TemplateDefinition:

interface TemplateDefinition {
  key: string                          // stable slug, unique across all templates
  version: string                      // semver (e.g. "1.0.0")
  name: string
  description?: string
  inputSchema: Record<string, unknown> // JSON Schema Draft-07, validates ADAPTED data
  fieldMappings?: FieldMappingEntry[]  // raw payload → adapted data shape
  renderConfig?: RenderConfig
  document: DocumentNode               // root AST node
}

RenderConfig

interface LocaleConfig {
  currency?: string   // BCP 47 locale tag for Intl.NumberFormat. Default: "en-US"
  date?: string       // BCP 47 locale tag for Intl.DateTimeFormat. Default: "en-AU"
}

interface RenderConfig {
  paperSize?: 'A4' | 'A3' | 'Letter' | 'Legal' | 'Tabloid'  // default: A4
  orientation?: 'portrait' | 'landscape'                      // default: portrait
  margin?: { top: string; right: string; bottom: string; left: string }
  customCss?: string                   // CSS injected into <style>. Safe mode (default): @import and url() fetch
                                       // vectors are stripped. Pass allowUnsafeCustomCss: true to
                                       // HtmlRenderer.render() for verbatim injection — admin sources only.
  fontImports?: string[]               // Font @import URLs. Always validated: https: only, private/internal hosts blocked.
  locale?: LocaleConfig                // locale for currency/date helpers
  pageHeader?: string                  // HTML allowlist-sanitized at render time; script/event-handler/external-resource vectors stripped; residual CSS and Chromium risk not zero. Passed to Puppeteer headerTemplate after sanitization. Inline styles only; font-size: 0 by default. Span classes: pageNumber, totalPages, date, title, url. Capped at 10,000 chars. Visible in PDF output only.
  pageFooter?: string                  // Same sanitization model and constraints as pageHeader. Passed to Puppeteer footerTemplate after sanitization. Capped at 10,000 chars. Visible in PDF output only.
}

locale applies to all {{currency}} and {{date}} helper calls within the template, including table column formatters. The number helper uses toFixed() and is not locale-aware. Both fields are optional; defaults are preserved when locale is omitted.

Templates are stored in PostgreSQL as JSONB (template_versions.definition). Multiple versions are retained; the latest by created_at is used for rendering. The TemplateDefinition shape is enforced at runtime by Zod (TemplateDefinitionSchema in apps/api/src/validation/template-definition.schema.ts) at all persistence boundaries — POST create, PUT update, and version restore. Unknown structural fields are rejected (strict mode); key must match /^[a-zA-Z0-9][a-zA-Z0-9_-]*$/ and version must be major.minor.patch with optional prerelease/build metadata suffixes.


2. AST Node Types

All nodes extend BaseNode:

interface BaseNode {
  id: string
  type: string
  style?: NodeStyle          // camelCase CSS properties → inline style=""
  paginationHints?: PaginationHints
  meta?: Record<string, unknown>
}

Structural nodes

Not part of the AnyContentNode union — not valid inside children arrays directly.

NodeKey fields
documentchildren: SectionNode[]
sectionlabel?, children: AnyContentNode[]
columnspan? (flex proportion), children: AnyContentNode[]

Content nodes

AnyContentNode union — valid anywhere a children array is accepted.

NodeKey fieldsHTML output
containerchildren: AnyContentNode[]<div>
columnscolumns: ColumnNode[], gap?<div style="display:flex">
headinglevel: 1–6, content: string (Handlebars)<h1><h6>
textcontent: string (Handlebars), inline?<p> or <span>
imagesrc: string (Handlebars), alt?, fit?, width?, height?<img>
tableSee Table Behaviour<table>
chartSee Chart Behaviour<div class="chart-wrapper"> + inline SVG
spacersize: string, horizontal?<div style="height:N"> or <span style="width:N">
dividerthickness?, color?, dashStyle?<hr>
pageBreak(no extra fields)<div style="break-after:page">
conditionalcondition, children, elseChildren?See Conditional and Repeater Behaviour
repeaterdataPath, children, emptyChildren?See Conditional and Repeater Behaviour
richTextcontent: RichTextBlock[]See RichText Node Behaviour
signatureFieldlabel, anchorText, tabType?, signerRole?, hidden?<span> (anchorText rendered; hidden mode: color:transparent; font-size:2pt)
templateReftemplateKey, dataPath?, version?, fieldMappings?Inline sections from referenced template; see Template Composition
tocmaxDepth? (1–6, default 3), ordered?, title?, showPageNumbers?, leaderStyle? ('dots'|'dashes'|'none')<div class="toc"> + <ul>/<ol> with anchor links to headings; with showPageNumbers also emits a leader + reserved page-number slot that is filled post-render by pdf-lib (see §14)
positionedtop, left, width, height, children: AnyContentNode[]<div style="position:absolute;..."> — physical-unit CSS values (mm, pt, in, px)
pivotTableSee Pivot Table Behaviour<table> — cross-tab grid with row/column dimension grouping and measure aggregation
barcodevalue: string (Handlebars), format (qr / code128 / code39 / ean13 / upc / datamatrix), width?, height?, ecLevel? (QR), displayValue? (1-D formats)<img> with inline SVG data URI, produced by @pulp-engine/barcode-renderer (bwip-js + qrcode)
customcustomType: string, props?, children?Plugin-registered node; rendered by the plugin’s HTML renderer (see plugin-api)

NodeStyle

Any camelCase CSS property is valid (e.g. fontWeight, backgroundColor, marginBottom). Compiled to kebab-case inline style="" attributes. No validation is applied to style values.


3. Field Mapping Behaviour

Field mappings transform the raw caller payload into the adapted data shape that the renderer and schema validator operate on.

interface FieldMappingEntry {
  targetPath: string    // dot-path written into adapted data, e.g. "customer.name"
  sourcePath?: string   // dot-path read from raw payload; defaults to targetPath if omitted
  defaultValue?: unknown
  expression?: string   // jexl expression evaluated against the raw payload; takes precedence over sourcePath when non-empty
}

Algorithm (DataAdapter.adapt):

  1. Deep-clone the raw payload (structuredClone) as the working object — unmapped keys remain accessible
  2. For each FieldMappingEntry:
    • If expression is a non-empty string, evaluate it as a jexl expression against the raw payload; use the result as the value
    • Otherwise, read value from raw payload at sourcePath (or targetPath if sourcePath is absent) using lodash-style dot-path resolution
    • If the resolved value is undefined and defaultValue is set, use defaultValue
    • If the final value is not undefined, write it to targetPath in the result
  3. Fields with no mapping pass through unchanged (the clone preserves them)

Important: defaultValue is only applied when the source resolves to undefined. A source value of null, false, 0, or "" is used as-is.

Expression evaluation: jexl expressions in expression are evaluated safely — missing identifiers resolve to undefined (not a throw). Invalid syntax surfaces as a 500 error from the API.

Scope: Field mappings apply at two levels:

  • Template-level (TemplateDefinition.fieldMappings): transforms the raw caller payload before rendering begins
  • TemplateRef-level (TemplateRefNode.fieldMappings): transforms the already dataPath-scoped data before it flows into the referenced sub-template’s children. The referenced template’s own top-level fieldMappings are NOT applied — only the referencing node’s fieldMappings.

4. Validation Behaviour

Schema validation runs after field mapping, against the adapted data shape. inputSchema describes the adapted shape, not the raw payload.

  • Validator: Ajv v8 with allErrors: true and ajv-formats (adds "date", "email", etc.)
  • On failure: HTTP 422 with body { error: 'Validation Failed', issues: ValidationError[] }
  • Each ValidationError has { path: string, message: string }

The POST /render pipeline order:

Raw payload → Field Mapping → Schema Validation → HTML Rendering → PDF Rendering

Validation errors are returned before any rendering occurs.


5. Formatting Helpers

Handlebars helpers are available inside any content string (text, heading, image src, table column headers/footers). All helpers are registered once and are idempotent.

{{currency value [currencyCode]}}

  • Formats a number as a currency string using Intl.NumberFormat
  • Locale: en-US (affects digit grouping)
  • currencyCode: ISO 4217 code, default 'USD'
  • Minimum fraction digits: 2
  • Example: {{currency loan.amount "AUD"}}A$50,000.00
  • Returns original value as string if not a valid number

{{number value [decimals]}}

  • Formats a number to a fixed number of decimal places
  • decimals: default 2
  • Example: {{number loan.rate 2}}5.75
  • Returns original value as string if not a valid number

{{date value [format]}}

  • Formats a date string or Date object using Intl.DateTimeFormat
  • Locale: en-AU
  • format options: 'short' | 'medium' (default) | 'long' | 'full'
    • short → dd/mm/yyyy style
    • medium → e.g. 14 Mar 2026
    • long → e.g. 14 March 2026
    • full → e.g. Saturday, 14 March 2026
  • Returns empty string if value is null or empty; returns raw string if invalid date

{{uppercase value}} / {{lowercase value}}

String case conversion. Value coerced to string before conversion.

{{sum array "field"}}

Sums a named numeric field across an array of objects. Designed for table footer expressions where a grand total needs to be computed inline.

  • array — resolves from the current data context (e.g. loan.items)
  • "field" — a simple (non-nested) field name on each item (e.g. "amount")
  • Numeric values are summed directly.
  • Numeric strings (e.g. "12.5") are coerced via parseFloat and included.
  • Non-numeric strings, null, undefined, and missing fields are ignored (treated as 0).
  • Non-array first argument or empty array returns 0.

Composable with other helpers via Handlebars subexpressions:

{{sum loan.items "amount"}}
{{currency (sum loan.items "amount") "AUD"}}

Table cell formatting

Table column definitions may specify format (helper name) and formatArgs (array of additional arguments). These are applied to cell values in the renderer, not via Handlebars template strings:

{ "key": "amount", "format": "currency", "formatArgs": ["AUD"] }

6. Conditional and Repeater Behaviour

Conditional (ConditionalNode)

Evaluates a structured condition against the current data context at render time.

interface StructuredCondition {
  path: string       // dot-path into adapted data
  operator: 'eq' | 'ne' | 'gt' | 'gte' | 'lt' | 'lte' | 'exists' | 'notExists'
  value?: unknown    // not required for exists/notExists
}

Operator semantics:

OperatorBehaviour
eqstrict equality (===)
nestrict inequality (!==)
gt / gte / lt / ltenumeric comparison only; returns false if either operand is not a number
existsresolved value is not undefined and not null
notExistsresolved value is undefined or null

Rendering:

  • Condition true → renders children
  • Condition false → renders elseChildren (if present), otherwise returns empty string (no wrapper, no placeholder)
  • Output is wrapped in a <div> with inline style only if the node has style or paginationHints; otherwise children are emitted unwrapped

conditionExpression (optional escape hatch): When conditionExpression is a non-empty, non-whitespace string, it is evaluated as a jexl expression against the current data context and takes precedence over condition. Missing identifiers evaluate to undefined (falsy). Invalid syntax throws ExpressionError which surfaces as a 500 from the API. When absent or blank, condition is evaluated as before.

Repeater (RepeaterNode)

Resolves an array from adapted data and renders children once per item.

interface RepeaterNode extends BaseNode {
  dataPath: string            // dot-path to the array
  children: AnyContentNode[]
  emptyChildren?: AnyContentNode[]
}

Behaviour:

  • dataPath resolves to an array → renders children N times with per-item scoped data
  • dataPath resolves to undefined (key absent) → returns empty string silently
  • dataPath resolves to a non-array, non-null value → returns empty string and emits console.warn
  • Array is empty → renders emptyChildren if present, otherwise returns empty string

Scoped data per iteration (merges in this order, later entries win):

  1. Parent adapted data
  2. Item properties (if item is a plain object; scalar items do not spread)
  3. Synthetic variables: @index (0-based integer), @first (boolean), @last (boolean)

7. Table Behaviour

Tables are first-class nodes rendered directly to <table> / <thead> / <tbody> / <tfoot>.

interface TableNode extends BaseNode {
  dataPath: string
  columns: TableColumnDefinition[]
  repeatHeader?: boolean        // default true — thead { display: table-header-group } for print
  showFooter?: boolean          // default false
  caption?: string
  emptyMessage?: string         // default 'No items.'
  tableStyle?: NodeStyle
  headerRowStyle?: NodeStyle
  bodyRowStyle?: NodeStyle
  alternateRowStyle?: NodeStyle  // applied to every odd-indexed row (1, 3, 5…)
  footerRowStyle?: NodeStyle
}

interface TableColumnDefinition {
  key: string             // dot-path within each row object
  header: string          // Handlebars string
  footer?: string         // Handlebars string, compiled against full adapted data (not row)
  width?: string
  align?: 'left' | 'center' | 'right'
  format?: string         // helper name: 'currency', 'date', 'number'
  formatArgs?: unknown[]  // additional args passed to the format helper
  style?: NodeStyle       // applied to <td> cells
  headerStyle?: NodeStyle // applied to <th> cells (merged with headerRowStyle)
  sparkline?: {           // when set, renders inline SVG instead of text
    sparkType: 'line' | 'bar'
    valuesPath: string    // dot-path within each row to a number array
    color?: string        // hex (#abc), rgb(), or hsl() — default: palette colour #0
    width?: number        // SVG width in pixels (20–400, default 80)
    height?: number       // SVG height in pixels (10–100, default 20)
  }
}

Body rows:

  • One <tr> per item in the resolved array
  • alternateRowStyle applied to rows at index 1, 3, 5… (0-based odd)
  • Cell value resolved via col.key dot-path within the row object, then formatted if col.format is set
  • null / undefined cell values render as empty string

Empty state:

  • If the array is empty, a single <tr><td colspan="N"> row is emitted with emptyMessage
  • Default emptyMessage: 'No items.'

Header:

  • Emitted as <thead> when repeatHeader !== false
  • thead { display: table-header-group } is included in the @media print CSS block, causing the header to repeat on every printed page

Footer:

  • Emitted as <tfoot> only when showFooter: true
  • Every footer cell emits col.align and col.style regardless of whether col.footer text is set (no bare unstyled <td>)
  • col.footer is a Handlebars string compiled against the full adapted data context, not the row — use this for aggregated values computed upstream

8. Chart Behaviour

Chart nodes render data arrays as inline SVG. No external charting library or JavaScript is required — the SVG is generated server-side and embedded directly in the HTML output.

interface ChartNode extends BaseNode {
  type: 'chart'
  chartType: 'bar' | 'line' | 'pie' | 'area' | 'donut' | 'horizontalBar' | 'stackedBar' | 'groupedBar' | 'scatter' | 'gauge' | 'funnel' | 'waterfall' | 'treemap' | 'heatmap' | 'combo'

  // Data binding
  dataPath: string        // dot/bracket path to an array in adapted data, e.g. "sales" or "report.items"
  categoryField: string   // field name on each item to use as category label (X-axis or pie slice label)
  valueField: string      // field name on each item to use as the numeric value

  // Display
  title?: string          // plain-text title rendered above the chart
  showLegend?: boolean    // show a legend; applied to pie/donut charts (stored on all types for forward compatibility)
  emptyMessage?: string   // shown when the data array is empty or all values are non-numeric. Default: "No chart data."

  // Sizing
  width?: string          // CSS width of the chart wrapper (default: "100%")
  height?: string         // CSS height of the chart wrapper (default: "300px")

  // Extended chart type fields
  xField?: string         // scatter: field name for X-axis numeric value
  seriesFields?: string[] // stackedBar/groupedBar/combo: ordered bar-series field names
  seriesLabels?: string[] // stackedBar/groupedBar/combo: human-readable bar-series labels
  innerRadiusRatio?: number // donut: inner radius as fraction of outer (0–1, default 0.55)
  gaugeMin?: number       // gauge: minimum scale value (default 0)
  gaugeMax?: number       // gauge: maximum scale value (default 100)

  // Data labels (bar / horizontalBar / stackedBar / groupedBar)
  showDataLabels?: boolean
  dataLabelPosition?: 'outside' | 'inside' | 'center' | 'auto'  // default 'auto'

  // Combo chart (bar + line overlay)
  lineSeriesFields?: string[]   // combo: line-series field names
  lineSeriesLabels?: string[]   // combo: line-series labels
  secondaryAxis?: {             // combo: when set, line series use a right-side y-axis
    label?: string
    min?: number
    max?: number
  }
}

Chart types

TypeDescription
barVertical bar chart. Bars adapt width to the number of categories. Category labels appear below each bar.
lineLine chart with data-point circles. Category labels appear below each point.
piePie chart. When showLegend: true the pie is offset left and a legend appears on the right showing category, colour swatch, and percentage. Percentage labels are rendered inside slices wider than ~23°.
areaArea chart. Filled polygon below the line with 20% opacity, plus solid stroke on top.
donutAnnular (ring) variant of pie. innerRadiusRatio controls hole size (default 0.55).
horizontalBarHorizontal bar chart. Category labels appear to the left of each bar.
stackedBarStacked vertical bar chart. Multiple seriesFields are stacked per category. Legend shown when >1 series.
groupedBarGrouped vertical bar chart. Multiple seriesFields are placed side-by-side per category. Legend shown when >1 series.
scatterScatter plot. Uses xField for X-axis and valueField for Y-axis. Optional categoryField for point labels.
gaugeHalf-circle gauge. Reads a single value (first array element or single object). gaugeMin/gaugeMax set the scale.
funnelFunnel chart. Stages stacked top-to-bottom with width proportional to valueField; category labels alongside each stage.
waterfallWaterfall chart. Successive positive/negative deltas on valueField; running total tracked across categories with colour-coded bars.
treemapTreemap. Rectangular areas sized by valueField and labelled from categoryField; squarified layout inside the viewBox.
heatmap2-D heatmap. Uses xField + categoryField as row/column dimensions and valueField as cell intensity on a sequential colour ramp.
comboGrouped bars (seriesFields) with one or more line overlays (lineSeriesFields). Optional secondaryAxis draws a right-side y-scale so bar/line series with mismatched magnitudes (e.g. volume + price) share one panel cleanly. Legend distinguishes bar (rect swatch) from line (stroke + dot) for monochrome print legibility. PDF target in v1 — docx/pptx renderers fall back to a bar-family rendering and drop the line overlay.

All chart types use a fixed 480×300 internal SVG viewBox (gauge uses 480×260) and a 6-colour Tableau-inspired palette.

Data labels (bar family)

bar, horizontalBar, stackedBar, groupedBar, and combo honour showDataLabels. With the default dataLabelPosition: 'auto', single-value bars place labels outside the bar end; stacked segments centre labels inside. Stacked segments shorter than ~12 px fall back to a right-side callout so no value is silently dropped; small-segment behaviour can be overridden by setting dataLabelPosition explicitly.

Data resolution

  1. dataPath is resolved against adapted data using the same dot/bracket resolver as table and repeater (supports items[0].sub syntax).
  2. For each item in the resolved array, categoryField and valueField are resolved.
  3. Items where valueField does not resolve to a number (or a numeric string) are silently skipped.
  4. If no usable data points remain, emptyMessage is rendered instead of the SVG.

HTML output

<div class="chart-wrapper" style="width:100%;height:300px">
  <p class="chart-title">Sales by Region</p>   <!-- only when title is set -->
  <svg viewBox="0 0 480 300" ...>...</svg>
</div>

NodeStyle and paginationHints on the chart node are merged into the wrapper style attribute alongside the width/height defaults.


9. RichText Node Behaviour

richText nodes store structured content as a JSON AST — no raw HTML is stored or emitted from arbitrary sources.

AST types

TypeDescription
RichTextNodeRoot content node. Has content: RichTextBlock[].
ParagraphBlock{ type: 'paragraph', children: RichTextInline[] }
OrderedListBlock{ type: 'orderedList', items: ListItemBlock[] }
UnorderedListBlock{ type: 'unorderedList', items: ListItemBlock[] }
ListItemBlock{ type: 'listItem', children: RichTextInline[], sublist?: OrderedListBlock | UnorderedListBlock }
TextLeaf{ type: 'text', text: string, bold?: true, italic?: true, underline?: true, color?: string }
LinkInline{ type: 'link', href: string, children: TextLeaf[] }
LineBreakInline{ type: 'lineBreak' }

Nested lists

Exactly one level of nesting is supported via ListItemBlock.sublist. Deeper nesting is not preserved in the AST — the editor caps nesting at one level on input.

Text colour

TextLeaf.color accepts CSS colour strings in these formats only:

FormatExamples
3-digit hex#rgb
6-digit hex#rrggbb
RGBrgb(0, 128, 255)
RGBArgba(0, 128, 255, 0.5)
HSLhsl(200, 100%, 50%)
HSLAhsla(200, 100%, 50%, 0.8)

Named colours (e.g. red, blue) are not supported. Invalid values are silently ignored by the renderer and flagged as an error by the editor validator.

HTML rendering

The renderer emits only whitelisted HTML tags: <div>, <p>, <ol>, <ul>, <li>, <strong>, <em>, <u>, <a>, <span>, <br />. No arbitrary HTML is passed through.

  • Handlebars expressions in TextLeaf.text use {{double-stash}} — all resolved values are HTML-escaped
  • Link href values are validated against a protocol allowlist (http, https, mailto, tel) before rendering; unsafe hrefs are rendered as plain text
  • Color values are validated with isSafeRichTextColor() before emitting <span style="color:...">

Current limitations

  • One level of list nesting only (sublist on ListItemBlock)
  • No raw HTML storage or arbitrary HTML passthrough
  • No markdown import or export
  • Editor link popover warns on unsafe protocols (javascript:, ftp:) and normalises bare domains to https://; the server renderer and validator enforce the protocol allowlist as a hard gate — unsafe hrefs are rendered as plain text
  • Handlebars expressions are preserved as plain text in Visual editing mode; use JSON mode for precise expression authoring

10. Pagination Behaviour

Pagination hints are compiled to both modern (break-*) and legacy (page-break-*) CSS properties for maximum Puppeteer/print compatibility.

interface PaginationHints {
  breakBefore?: 'auto' | 'always' | 'avoid' | 'page' | 'left' | 'right'
  breakAfter?: 'auto' | 'always' | 'avoid' | 'page' | 'left' | 'right'
  breakInside?: 'auto' | 'avoid'
  keepWithNext?: boolean    // compiles to break-after: avoid
}

Mapping:

HintCSS emitted
breakBefore: "always"break-before: always; page-break-before: always
breakAfter: "avoid"break-after: avoid; page-break-after: avoid
breakInside: "avoid"break-inside: avoid; page-break-inside: avoid
keepWithNext: truebreak-after: avoid; page-break-after: avoid

Pagination hints are applied as inline styles on the node’s wrapper element. These are CSS soft hints — Puppeteer honours them when there is sufficient content flow room; hard layout constraints take precedence.

The <style> block in every rendered document includes:

@media print {
  thead { display: table-header-group; }
  tfoot { display: table-footer-group; }
  tbody { display: table-row-group; }
}

11. Pivot Table Behaviour

The pivotTable node transforms a flat source array into a cross-tab (pivot) grid, grouping by row and column dimensions and aggregating numeric measures.

Node definition

interface PivotTableNode extends BaseNode {
  type: 'pivotTable'
  dataPath: string                // path to flat source array
  rowDimensions: PivotDimension[] // 1–4 dimensions; multi-level supported
  columnDimension: PivotDimension // single field → column headers
  measures: PivotMeasure[]        // 1+ numeric measures to aggregate
  showRowGrandTotal?: boolean     // default: true
  showColumnGrandTotal?: boolean  // default: true
  showRowSubtotals?: boolean      // default: false (no-op for single-dim pivots)
  subtotalLabel?: string          // default: "{value} Total" (use {value} placeholder)
  emptyCell?: string              // default: '' (displayed when intersection has no data)
  grandTotalLabel?: string        // default: 'Grand Total'
  caption?: string
  emptyMessage?: string
  tableStyle?: NodeStyle
  headerRowStyle?: NodeStyle
  bodyRowStyle?: NodeStyle
  alternateRowStyle?: NodeStyle
  totalRowStyle?: NodeStyle
}

interface PivotDimension {
  field: string       // dot-path within each row
  label?: string      // display label (defaults to field name)
  sort?: 'asc' | 'desc' | 'none'  // default: 'asc'
}

interface PivotMeasure {
  field: string                    // dot-path (numeric)
  label: string                    // sub-column header
  aggregation?: 'sum' | 'count' | 'avg' | 'min' | 'max'  // default: 'sum'
  format?: string                  // Handlebars helper name
  formatArgs?: unknown[]
}

Data transform

The pivot transform is a pure function in @pulp-engine/data-adapter. Algorithm:

  1. Single pass over the source array: extract composite row key, column key, and measure values per item.
  2. Accumulate into Map<rowKey, Map<colKey, Accumulator[]>> — each accumulator tracks running sum, count, min, max (all five aggregation types derivable from these four).
  3. Sort row and column keys (ascending by default, numeric sort when all values parse as numbers).
  4. Build output — for each (row, col) pair, resolve the configured aggregation. Missing intersections yield null.

HTML output

When measures.length > 1, a two-row <thead> is emitted:

  • Row 1: dimension labels (with rowspan="2") + column key headers (with colspan=measures.length)
  • Row 2: measure labels repeated per column group

When measures.length === 1, a single header row is emitted (column keys only).

Grand total row renders in <tfoot>. Cell formatting reuses the same Handlebars helper pipeline as the table node (currency, number, date, etc.).

Multi-level row dimensions and subtotals (v0.71.0+)

The pivot transform supports 1–4 row dimensions (Zod-bounded). When showRowSubtotals: true AND rowDimensions.length > 1, the transform emits one subtotal row per closed group at every depth except the leaf.

The algorithm uses per-depth accumulator slots that ancestor leaves merge into. As the sorted leaf-row scan crosses a group boundary, every depth from firstDiffer to dimCount-2 gets flushed (deepest first) and reset for the next sibling group. Subtotal rows carry isSubtotal: true, depth: d (the closed dimension index), and dimensionValues truncated to the closed prefix. When showColumnGrandTotal is on, subtotal rows also include their own rowTotal cells so the trailing grand-total column stays rectangular.

Aggregation correctness: subtotals derive from the same Accumulator struct as leaves (sum, count, min, max). The avg aggregation is computed at every level from the merged sum/count, NOT as average-of-averages — so a subtotal “Avg / Deal” reflects the true mean of the underlying values, not the mean of the children’s means.

Subtotal label: the subtotalLabel field templates the label cell, with {value} substituting the deepest closed dimension value. Default is "{value} Total". Empty string renders just the value; a literal without {value} renders verbatim.

Cap rationale: the 4-dimension limit bounds header rowspan complexity in HTML/DOCX and mergeCells count in XLSX. Lifting it later requires re-validating those header paths, not just the schema.

Rendering targets

TargetOutput
HTML/PDF<table> with <thead>, <tbody>, <tfoot>. Subtotal rows carry class pivot-subtotal pivot-subtotal-d{depth} for CSS hooks; default inline shading applied for PDF (no external CSS).
DOCXdocx.Table with header merging via columnSpan. Themed headerRowStyle / bodyRowStyle / alternateRowStyle / totalRowStyle honored end-to-end (translated to docx cell shading + run properties). Subtotals inherit the total row style as a base over a lighter SUBTOTAL_BG default.
PPTXNative pptxgenjs slide.addTable call. Themed row styles honored (translated to pptxgenjs cell options). Content-aware column widths (mimics CSS table-layout: auto) so wide grand-total values like $4,571,700.00 aren’t crowded into per-quarter-sized slices.
CSVFlattened grid matching HTML layout (RFC 4180). Subtotal label written in the depth-th column, blank cells for trailing dimensions.
XLSXExcelJS worksheet, numeric cells as numbers, header merging. Subtotal rows use mergeCells for the trailing label span, bold + SUBTOTAL_FILL shading.

12. Features, Limitations, and Non-Goals

Features added beyond the original MVP scope

The following were originally considered deferred but have since shipped:

FeatureNotes
Drag-and-drop template editor (apps/editor)Implemented and in internal pilot — see docs/editor-guide.md
Authentication / authorisationScoped API key model (API_KEY_ADMIN, API_KEY_RENDER, API_KEY_PREVIEW, API_KEY_EDITOR), editor session tokens (HMAC-signed), named-user mode (EDITOR_USERS_JSON). See docs/api-guide.md.
Template version pinningOptional version field in POST /render body. See docs/api-guide.md.
PDF streamingBoth POST /render and POST /render/preview/pdf stream via Puppeteer createPDFStream() — no Node.js PDF buffer. Chrome render-time memory still applies; in RENDER_MODE=container / socket it applies inside the ephemeral worker, insulated from the API process.
Image asset managementUpload, list, and delete image files via POST /assets/upload, GET /assets, DELETE /assets/:id. Files stored in ASSETS_DIR and served at /assets/*. Image src strings can reference assets by path (e.g. /assets/my-logo.png).
Chart / graph nodes15 chart types (bar, line, pie, area, donut, horizontalBar, stackedBar, groupedBar, scatter, gauge, funnel, waterfall, treemap, heatmap, combo) rendered as inline SVG. Optional showDataLabels on the bar family; combo supports a dual-axis line overlay. Sparkline formatter on table columns. See Chart Behaviour.
Barcode / QR node6 formats (QR, Code 128, Code 39, EAN-13, UPC, Data Matrix) via @pulp-engine/barcode-renderer. Handlebars-capable value. Rendered as inline SVG in all output formats.
Template CRUDPOST /templates (create), PUT /templates/:key (update), DELETE /templates/:key (soft delete), POST /templates/:key/versions/:version/restore (restore historical version).
dataPath bracket indexingitems[0].sub syntax supported in repeater, table, and chart dataPath fields.
richText node with structured contentParagraph, ordered/unordered list (one-level nested), bold, italic, underline, text colour, links (safe protocols), line breaks. Safe HTML renderer with whitelisted tags. Visual and JSON editing in apps/editor. See RichText Node Behaviour.
Template composition (templateRef)Inline rendering of referenced templates with dataPath scoping and fieldMappings for data transformation. Max nesting depth: 5. See Template Composition.
Table of contents (toc)Auto-generated anchor-linked TOC from document headings. See TOC Node Behaviour.
Absolute positioning (positioned)Container with physical-unit CSS positioning for form overlays. CSS unit whitelist enforced. See Positioned Node Behaviour.
Cross-tab / pivot table (pivotTable)Pre-render data transform: flat array → 2D grid grouped by row/column dimensions with aggregated measures (sum, count, avg, min, max). All render targets (HTML, DOCX, CSV, XLSX). See Pivot Table Behaviour.

Known limitations

See the consolidated Current limitations table in mvp-scope.md — that is the canonical list.

Known constraints

  • Puppeteer startup: first PDF request per process takes ~2–3 s (cold browser) Resolved — the browser is now warmed at startup via warmBrowser() (in-process mode) or the child-process dispatcher’s warmup() IPC (child-process mode). The ~2–3 s Chromium launch cost is paid during server boot, not on the first request. Subsequent requests continue to reuse the singleton browser instance. Container/socket modes are ephemeral and unaffected.
  • null treated as missing by exists/notExists: intentional — document templates rarely distinguish null from absent.
  • Numeric comparison operators (gt, gte, lt, lte): return false (not an error) if either operand is not a number.
  • Alternating row style: applies to rows at odd indices (1, 3, 5…), not to even indices. The first data row (index 0) always uses bodyRowStyle.
  • dataPath bracket indexing: items[0].sub syntax is supported for repeater, table, and chart nodes. Only non-negative integer indices are valid; negative indexes, floats, string/quoted keys, bare bracket-first paths ([0].a), and unclosed brackets throw a DataPathError at render time. Missing properties or out-of-range indexes return undefined without throwing.

13. Template Composition

The templateRef node embeds a referenced template’s document sections inline at render time.

interface TemplateRefNode extends BaseNode {
  type: 'templateRef'
  templateKey: string              // key of the template to resolve
  dataPath?: string                // dot-path to scope parent data
  version?: string                 // version pin (latest if omitted)
  fieldMappings?: FieldMappingEntry[]  // transform scoped data before sub-template rendering
  dataSource?: TemplateRefDataSource   // external data source (fetched server-side)
  dataMergeStrategy?: 'replace' | 'shallow' | 'deep'  // how fetched data merges with parent
}

Resolution: All template references are pre-resolved before the synchronous render walk via resolveAllRefs(). Max nesting depth is 5. Cycle detection prevents infinite recursion.

Data flow: parent data → dataPath scoping → data source fetch + merge → fieldMappings (DataAdapter.adapt) → sub-template children. The referenced template’s own top-level fieldMappings are not applied — only the referencing node’s mappings. The referenced template’s renderConfig (margins, headers, fonts) is also not applied — only its document sections are rendered inline.

Field mappings on templateRef enable parameterized composition: a master template can pass transformed data to a reusable sub-template (e.g. a “letterhead” fragment receiving recipientName from the parent’s customer.name).

Data Source Binding

When dataSource is set, the sub-report fetches its own data server-side during a pre-render resolution pass (after resolveAllRefs, before the synchronous render walk). Sibling data sources at the same depth are fetched in parallel.

interface TemplateRefDataSource {
  type: string                     // 'static' | 'url' | plugin-registered
  data?: Record<string, unknown>   // type='static': inline data payload
  url?: string                     // type='url': endpoint (supports {{handlebars}} interpolation)
  headers?: Record<string, string> // type='url': HTTP headers (values support interpolation)
  timeoutMs?: number               // fetch timeout (default 10s, range 1–30s)
  method?: 'GET' | 'POST'         // HTTP method (default GET)
  body?: string                    // POST body (supports interpolation)
  config?: Record<string, unknown> // arbitrary config for plugin types
}

Merge strategies (dataMergeStrategy, default 'replace'):

  • replace — fetched data replaces parent context entirely (sub-report is independent)
  • shallow{ ...parentScoped, ...fetched } (inherit shared context like company info)
  • deep — recursive merge, fetched wins at leaf level; arrays replaced wholesale

Error handling: Production renders fail with DataSourceError (HTTP 422, code data_source_error). Preview renders degrade gracefully — the sub-report falls back to parent-scoped data.

Security: URL data sources are validated against the same SSRF guard as webhook delivery (validateWebhookUrl). The UrlValidator is injected via dependency injection to maintain package boundaries.

Plugin extensibility: Plugins register custom data source types via ctx.registerDataSourceProvider(type, provider). Built-in types static and url cannot be overridden.

Caching: A per-render cache deduplicates identical URL fetches (keyed by post-interpolation URL + headers). Static data sources are not cached (clone is cheap).


14. TOC Node Behaviour

The toc node auto-generates a table of contents from all headings in the document.

interface TocNode extends BaseNode {
  type: 'toc'
  maxDepth?: 1 | 2 | 3 | 4 | 5 | 6  // default 3
  ordered?: boolean                  // <ol> when true, <ul> when false (default)
  title?: string                     // optional heading above the list
  showPageNumbers?: boolean          // default false — when true, reserves a right-aligned slot and post-processes the PDF to stamp real page numbers
  leaderStyle?: 'dots' | 'dashes' | 'none'  // default 'dots'; only meaningful when showPageNumbers is true
}

Heading collection: Before rendering begins, HtmlRenderer.buildBody() performs a lightweight AST pre-walk that collects { id, level, content } for every heading node in the document — including headings inside containers, columns, conditionals, repeaters, positioned nodes, and templateRefs. Heading content is interpolated via interpolatePlain() so Handlebars expressions resolve to plain text (no HTML entities in TOC link text).

Anchor links: All <h1><h6> tags in rendered output include id="heading-{nodeId}". The TOC emits <a href="#heading-{nodeId}"> links. In PDF output, Chromium resolves these as in-document jumps.

Filtering: Only headings with level <= maxDepth appear in the TOC. Indentation is applied via margin-left proportional to heading level.

Page numbers (PDF output): When showPageNumbers is true, the HTML renderer emits a wrap-safe leader (radial-gradient for dots, linear-gradient for dashes) followed by a reserved-width <span class="toc-page-slot"> per entry. The HTML/preview path leaves the slot blank — page numbers are a print-only artifact.

On the PDF path, TemplateRenderer walks the AST via hasTocWithPageNumbers and, when any reachable TOC node requests page numbers, sets stampTocPageNumbers: { destPrefix: 'heading-' } on the dispatch options. After Puppeteer produces the PDF, pdf-renderer invokes @pulp-engine/pdf-transform#stampTocPageNumbers which:

  1. Reads the PDF’s /Catalog/Dests (Chromium emits one entry per HTML id) to build a {destName → destination page} map.
  2. Walks each page’s /Annots, selects /Link annotations whose destination name starts with heading-, and uses pdf-lib’s drawText to stamp the resolved page number right-aligned inside each annotation’s /Rect.

CSS target-counter() is not used — Chromium’s Puppeteer page.pdf() does not resolve it against printed pages. The destPrefix filter prevents non-TOC <a href="#heading-…"> anchors from being stamped inadvertently.

Known limitation — wrapped titles: When a TOC entry title wraps to a second line, the leader sits between the last line and the page number rather than continuing per-line. Acceptable for v1; short titles avoid this.


15. Positioned Node Behaviour

The positioned node renders an absolutely-positioned container for form overlays and pre-printed stationery use cases.

interface PositionedNode extends BaseNode {
  type: 'positioned'
  top: string       // e.g. "10mm", "72pt", "1in", "50px"
  left: string
  width: string
  height: string
  children: AnyContentNode[]
}

CSS unit validation: Position and dimension values are validated against a whitelist regex: /^\d+(\.\d+)?(mm|pt|in|px|cm|em|rem|%)$/. This is enforced at both the Zod schema (API boundary) and the renderer (defense in depth). Invalid values fall back to 0px. This blocks CSS injection vectors (expression(), url(), calc(), semicolon injection).

Rendering: Emits <div style="position:absolute;top:...;left:...;width:...;height:...">children</div>. Children are rendered recursively.

Parent requirement: The positioned node must be placed inside a container with style.position: 'relative' set by the template author. The renderer does not auto-wrap the parent — this would require parent-context awareness, violating the stateless-per-node rendering model.