Ripple

The core insight

A different execution model.

This is not a wrapper around existing stream processors. When one trade arrives in a 10,000-symbol pipeline, traditional systems recompute entire windows. Ripple touches exactly the nodes that depend on the changed input — and nothing else. Not an optimization. A fundamentally different approach.

Traditional

Input record
  ↓
Operator → Operator → Operator → ...
  (full values at every step)
  (recompute everything)

One trade arrives:
  → Recompute entire window
  → Re-serialize full record
  → ~27,000 ns

Input delta
  ↓
Incr node → Incr node → Incr node
  (only changes propagate)
  (only affected subgraph fires)

One trade arrives:
  → Update 3 nodes (leaf + map + fold)
  → Delta = diff of old & new
  → ~250 ns

Performance

Measured. Not projected. Not marketing.

Every number comes from benchmarks running on the actual OCaml implementation. A pre-commit hook gates every commit against these targets. If performance regresses, the commit is blocked.

0 ns / stabilization

0 ns / serde roundtrip

0 ns / schema check

2.16M events / second

2.1 seconds / 6M recovery

0.1% heap growth / 1M events

Architecture

The data path.

Events enter as leaf values. A min-heap processes dirty nodes in topological order. The incremental fold updates O(1) per changed parent. Outputs produce idempotent deltas, serialized via bin_prot with CRC-32C.

Leaf

set_leaf(v)

→

Map

recompute

→

Incr Fold

O(1) update

→

Output

diff(old, new)

→

Delta

bin_prot + CRC

→

Remote

downstream

(* VWAP pipeline in 6 lines *)
let pipeline =
  Topology.create ~name:"vwap-60s"
  |> source ~name:"trades" ~topic:"raw-trades"
  |> partition ~name:"by-symbol" ~key_name:"symbol"
  |> window ~name:"1min" ~config:(Tumbling { size = 60.0 })
  |> fold ~name:"vwap" ~f_name:"Vwap.compute"
  |> sink ~name:"output" ~topic:"vwap-results"
    

Delta algebra

Idempotent by construction.

Deltas are composable, associative, and idempotent. Applying the same update twice produces the same result as applying it once. This gives you effectively-once semantics without distributed transactions.

Idempotent

apply(d, apply(d, v)) = apply(d, v)

Roundtrip

apply(diff(old, new), old) = Ok new

Annihilation

compose(d, Remove) = Remove

Right identity

compose(d, Set(v)) = Set(v)

Associative

compose(a, compose(b, c)) = compose(compose(a, b), c)

Compatibility

apply(compose(d1, d2), v) = apply(d2, apply(d1, v))

Design philosophy

Every decision has a reason.

↓

Heap-based dirty propagation

A min-heap keyed on (height, node_id) processes dirty nodes in topological order. O(R log R) where R is recomputed nodes — not O(N) total graph size. The node_id tiebreaker guarantees deterministic ordering for replay.

Incremental fold

When one parent changes in a 10,000-entry fold, we subtract the old value and add the new. O(1) per changed parent. The fold tracks which parents changed during dirty propagation, so it never scans the full array.

Effect injection

All non-determinism flows through an injectable EFFECT interface. Swap in a test clock and seeded PRNG for deterministic replay. Same inputs always produce the same outputs. This is the foundation of checkpoint/restore correctness.

≡

Absolute patches

Patches set fields to values, never increment them. “Set price to 150” not “add 5 to price.” This makes every delta idempotent: retries, duplicates, and out-of-order delivery are all safe.

CRC-32C on every message

A single corrupt byte in the payload length field misframes every subsequent message on the TCP stream. CRC-32C catches bit flips before deserialization. 4 bytes overhead, ~10ns computation. Non-negotiable.

OCaml

Jane Street’s ecosystem: Core, Async, bin_prot, sexplib. Type system strong enough to encode schema compatibility at compile time. Predictable performance with no GC pauses comparable to the JVM. ppx derivers for zero-boilerplate serialization.

Verification

Proven, not hoped.

Property-based tests verify algebraic laws across thousands of random inputs. Chaos tests crash the system at 100 random points and verify correct recovery every time. Load tests confirm no memory leaks over millions of events.

117

Inline expect tests

Every module, every code path.

6,500+

Property test iterations

11 algebraic properties verified with random inputs.

100/100

Chaos recovery

Random crash point → checkpoint → restore → correct output.

< 1 μs

p99 latency

Across 100K stabilization cycles. Target was 100ms.

0.1%

Heap growth

Over 1M events. No leaks. Fixed-size graph structure.

O(1)

Recomputation

Max 3 nodes recomputed regardless of graph size.

Why this exists

The tooling shouldn't be the bottleneck.

The incremental computation model is incredibly powerful — but it's been locked inside single-process libraries. If you wanted surgical recomputation, you gave up distribution. If you wanted distribution, you gave up precision.

Ripple closes that gap. A two-person desk running 500 symbols deserves the same incremental architecture as a team running 50,000. The difference should be cluster size, not a different framework.

This is not a wrapper around existing stream processors.
This is a different execution model.

Most systems recompute too much. This one doesn't.

Watch it work.