gnata-sqlite
Explanation

Query Planner

How jsonata_query decomposes expressions into streaming accumulators — matching native SQL performance

Query Planner

jsonata_query() includes a query planner that decomposes JSONata expressions into streaming accumulators at compile time. Result: 83ms vs 84ms native SQL on 100K rows.

AST classification

The planner walks the JSONata AST and classifies every node:

ClassificationExamplePer-row costMemory
CONSTANT"Q1 Report", 42NoneO(1)
ACCUMULATOR$sum(path), $count()Extract + accumulateO(1)
FILTERED ACCUMULATOR$sum($filter($, fn).path)Predicate + conditional accumulateO(1)
DERIVED$sum(x) - $count($)None (post-aggregate)O(1)
OPAQUE$sort(...), $reduce(...)Accumulate rowO(n)

Example decomposition:

{
  "revenue":   $sum($filter($, fn).amount),
  "refunds":   $sum($filter($, fn).amount),
  "net":       $sum(...) - $sum(...),
  "avg":       $round($average(amount), 2),
  "customers": $count($distinct(customer))
}

→ 5 streaming accumulators + post-aggregate arithmetic. net reuses the same accumulators as revenue and refunds (common subexpression elimination). Final object assembled once.

Key optimizations

Batch GJSON extraction (4.2x improvement)

Inspired by DuckDB's JSON shredding. All unique field paths collected at plan time. Per row, one gjson.GetManyBytes call extracts all values in a single scan.

  • Before: 5 accumulators x 1 GJSON scan = 5 scans/row
  • After: 1 scan/row, results distributed by index

Predicate sharing

Inspired by ClickHouse's -If combinator. Identical predicates deduplicated at plan time. Each unique predicate evaluates once per row. Multiple accumulators reference the boolean result by index.

Constant folding

Inspired by Spark Catalyst.

  • $sum(amount * 1.1)$sum(amount) * 1.1 (multiplicative constant to post-aggregate)
  • $sum(amount + 5)$sum(amount) + 5 * $count($) (algebraic identity for additive constants)

Streaming reference

PatternStreamsMemory
$sum(path), $count(), $max, $min, $averageyesO(1)
$sum($filter($, fn).path)yesO(1)
$count($distinct(path))yesO(unique)
$sum(path * constant)yes (constant folded)O(1)
{ key: agg, key: agg, ... }yes (parallel)O(1)
$round($average(x), 2)yes (finalize once)O(1)
$sum(x) - $count($)yes (post-aggregate)O(1)
"Q1 Report"yes (constant)O(1)
$sort(...)noO(n)
$reduce($, fn, init)noO(n)

Mixed expressions: only opaque subtrees pay O(n). Streaming accumulators run in constant memory.

Why it matches native SQL

Native SQL calls json_extract per aggregate per row. With 5 aggregates: 8+ json_extract calls/row.

jsonata_query does 1 gjson.GetManyBytes per row for all fields, evaluates each predicate once, updates all accumulators from pre-extracted values.

At 9+ aggregates, jsonata_query is faster (95ms vs 102ms) — SQL's per-aggregate extraction grows linearly while batch extraction stays nearly flat.

Performance progression

MethodTime (100K rows)Memory
jsonata_query v2 (current)83msO(unique)
SQL (single-scan FILTER/CASE)84msO(1)
SQL (5 subqueries)150msO(1)
jsonata_query v1349msO(unique)
jsonata_query (accumulate)439msO(100K)

On this page