Incremental Computation

Incremental computation is the foundational idea behind Ripple. This page explains what it is, why it makes Ripple fundamentally faster than re-computation-based systems, and how it compares to existing stream processing frameworks.

The Core Insight

Most data processing systems react to a change by recomputing everything from scratch. If you have 10,000 symbols and one price changes, a naive system recomputes all 10,000 VWAPs.

Incremental computation takes a different approach: recompute only what is affected by the change. If one price changes, recompute that symbol’s VWAP and any downstream aggregation – nothing else.

This is the difference between O(N) and O(R), where R is the number of nodes reachable from the changed input. In the typical case of a single trade arriving, R = 3 (leaf + map + fold), regardless of whether N is 1,000 or 100,000.

Self-Adjusting Computation

Ripple is built on the theory of self-adjusting computation, originally developed by Umut Acar and later implemented in Jane Street’s Incremental library. The key properties:

Memoization of intermediate results. Every node in the computation graph caches its last computed value. A node is only recomputed if at least one of its inputs has changed.
Change propagation, not re-execution. When an input changes, the system identifies the minimal set of nodes that need updating and processes them in topological order.
Cutoff optimization. Even when a node is recomputed, if its new value equals its old value (by the user-provided equality function), propagation stops. This prevents phantom updates from rippling through the graph.

Comparison with Stream Processing Frameworks

Apache Flink

Flink processes events through a DAG of operators. Each operator receives an event, processes it, and emits results downstream. This is a push-based dataflow model.

Aspect	Flink	Ripple
Model	Push-based dataflow	Self-adjusting computation
Recomputation	Per-event through full operator chain	Only affected subgraph
State	Per-operator keyed state (RocksDB)	Node values in array (in-memory)
Checkpointing	Chandy-Lamport barriers	Snapshot leaf values + input offsets
Language	Java/Scala	OCaml
Latency floor	Network + serialization overhead	Array index + function call

Flink’s strength is horizontal scalability across commodity clusters. Ripple’s strength is per-node efficiency – sub-microsecond stabilization for the common case.

Kafka Streams

Kafka Streams is a client library (no separate cluster) that processes records from Kafka topics. It uses a topology of processors connected by internal topics.

Aspect	Kafka Streams	Ripple
Deployment	Embedded in application	Dedicated worker processes
State	RocksDB state stores	In-memory incremental graph
Windowing	Time-based with wall-clock triggers	Event-time with watermarks
Exactly-once	Kafka transactions	Idempotent processing + dedup
Recomputation	Full reprocess on state restore	Checkpoint leaf values, recompute derived

Materialize / Differential Dataflow

Materialize (built on Timely/Differential Dataflow) is the closest conceptual relative to Ripple. Both maintain materialized views that update incrementally.

Aspect	Materialize	Ripple
Foundation	Differential Dataflow (Rust)	Incremental (OCaml)
Interface	SQL	OCaml module signatures
Change representation	(data, time, diff) triples	Delta type (Set/Patch/Remove)
Deployment	Standalone database	Embeddable library + workers

Ripple deliberately targets the same niche but within the Jane Street OCaml ecosystem, where interop with Core, Async, bin_prot, and ppx_* is essential.

Why O(R) Matters

Consider a VWAP pipeline with 2,000 symbols (the maximum per worker):

Graph structure:
  2000 leaf nodes (one per symbol)
  2000 map nodes (vwap_price extraction)
  1 incr_fold node (portfolio total)
  ─────────────────────────
  4001 total nodes

When a single trade arrives:

Approach	Nodes touched	Time (measured)
Full recompute	4,001	~27 us
Incremental (Ripple)	3	~250 ns

The incremental approach is 100x faster for the single-change case. This difference compounds: at 100K events/sec, the full-recompute approach spends 2.7 seconds per second on computation alone, while the incremental approach spends 25 milliseconds.

The Stabilization Cycle

Every change in Ripple follows the same pattern:

1. set_leaf(var, new_value)    -- marks leaf dirty, pushes to heap
2. stabilize(graph)            -- processes dirty heap in height order
3. watch(graph, node)          -- reads current value (O(1) array lookup)

Between steps 1 and 2, you can set multiple leaves. Stabilization processes all of them in a single pass, which is more efficient than stabilizing after each individual change.

set_leaf(price_AAPL, 150.0)    -- AAPL leaf -> dirty
set_leaf(price_GOOG, 2800.0)   -- GOOG leaf -> dirty
set_leaf(price_MSFT, 420.0)    -- MSFT leaf -> dirty
stabilize(g)                    -- processes 3 leaves + their dependents
                                -- in one pass, height-ordered

This batching property is exploited by the VWAP demo, which processes 1,000 trades before stabilizing.

Deterministic Replay

Because the incremental graph is a pure function from inputs to outputs (given a deterministic clock), Ripple supports deterministic replay: load a checkpoint, replay the input log from the checkpoint’s offset, and arrive at exactly the same state as before the crash.

This property requires that no module in Ripple calls Time_ns.now() or Random.int directly. All non-determinism flows through the EFFECT module interface (see Effect Injection).

Keyboard shortcuts

Ripple