Memory Model
This page documents Ripple’s memory usage characteristics: per-node memory, the arena-style layout, GC behavior, heap growth bounds, and the node lifecycle.
Node Memory Model
Each node in the incremental graph occupies approximately 200 bytes:
Field Size Notes
─────────────────────────────────────────────────────────
id : int 8 bytes Node identifier
height : int 8 bytes Topological height
value : Obj.t 8 bytes Pointer to heap-allocated value
is_dirty : bool 8 bytes 1 byte + 7 padding (word-aligned)
is_output : bool 8 bytes 1 byte + 7 padding
old_value : Obj.t 8 bytes Previous value for delta generation
recompute : recompute_fn 40-80 B Variant with closure(s)
cutoff_fn : closure 8 bytes Pointer to equality function
dependents : int list 16+ B Cons cells (typically 1-3 elements)
sexp_of_value : option 8 bytes None or Some(closure)
value_of_sexp : option 8 bytes None or Some(closure)
─────────────────────────────────────────────────────────
Total (typical) ~200 bytes
The exact size varies by recompute variant:
Leaf: smallest (~160 bytes) – no parent referencesMap: ~180 bytes – one parent ID + closureMap2: ~200 bytes – two parent IDs + closureFold: ~200 bytes – parent array + closure + init valueIncr_fold: ~280 bytes – parent array + two closures + parent_values snapshot + parent_index hashtable + changed indices list
Arena-Style Layout
Nodes are stored in a contiguous array that grows by 2x when full:
Initial allocation: 1024 nodes
┌─────┬─────┬─────┬─────┬─────┬──────────────────────┐
│ n0 │ n1 │ n2 │ ... │ n200│ <unused capacity> │
└─────┴─────┴─────┴─────┴─────┴──────────────────────┘
^ ^ ^
0 node_count=201 len=1024
After growth (triggered when node_count reaches capacity):
┌─────┬─────┬─────┬──────────────────────────────────┐
│ n0 │ n1 │ ... │ <unused capacity> │
└─────┴─────┴─────┴──────────────────────────────────┘
^ ^
0 len=2048
Growth is done via Array.blit – the old array’s contents are copied to the new array. This happens only during graph construction, never during stabilization.
Memory Usage by Graph Size
| Symbols | Total Nodes | Node Array | Dirty Heap | Total |
|---|---|---|---|---|
| 100 | 301 | 60 KB | 10 KB | ~70 KB |
| 500 | 1,501 | 300 KB | 48 KB | ~350 KB |
| 1,000 | 3,001 | 600 KB | 96 KB | ~700 KB |
| 2,000 | 6,001 | 1.2 MB | 192 KB | ~1.4 MB |
| 10,000 | 30,001 | 6 MB | 960 KB | ~7 MB |
For the recommended maximum of 2,000 symbols per worker, the graph engine’s total memory footprint is approximately 1.4 MB. This easily fits in L2 cache on modern CPUs.
GC Behavior
Minor Heap Pressure
The stabilization loop’s hot path is designed to avoid minor heap allocation:
- Heap push/pop operates on pre-allocated arrays
- Node field updates are mutable in-place mutations
- Numeric recompute functions (float arithmetic) use unboxed doubles
Allocation occurs only in:
- User-provided functions that create new values (e.g., record construction)
output_changelist construction (one per changed output per stabilization)dependentslist cons when adding edges (graph construction only)
Major Heap Stability
A healthy Ripple worker exhibits stable major heap behavior:
Heap words over time (typical):
Words
1.5M ┤ ── stable plateau
│ ┌──────────────────────────────────
1.0M ┤ │
│ │ graph construction
0.5M ┤ │
│──┘
0.0M ┤
└──────────────────────────────────────
0s 5s 10s 15s 20s 25s 30s
The heap grows rapidly during graph construction (first 1-2 seconds), then stabilizes. During steady-state processing, the major heap should not grow significantly.
0.1% Heap Growth Bound
The architecture requires that major heap growth during steady-state processing is less than 0.1% per checkpoint interval (10 seconds). This is monitored by the ripple_system_heap_words gauge.
If heap growth exceeds this bound, it indicates one of:
- A memory leak in a user-provided recompute function
- Unbounded accumulation in a fold node
- Growing
dependentslists (should not happen after construction)
Monitoring GC
(* In the heartbeat loop *)
let gc = Gc.stat () in
Metrics.Gauge.set Metrics.System.heap_words
(Float.of_int gc.heap_words);
Key Gc.stat fields to watch:
| Field | Healthy Range | Alarm |
|---|---|---|
heap_words | Stable after construction | Continuous growth |
minor_collections | Proportional to event rate | >> event rate |
major_collections | < 1 per checkpoint interval | > 10 per interval |
compactions | 0 in steady state | Any compaction |
Node Lifecycle
Nodes in Ripple follow a four-phase lifecycle:
Created ──> Active ──> Stale ──> Collected
│ │ │ │
│ add_node │ normal │ remove │ GC reclaims
│ │ ops │ from │
│ │ │ graph │
Created
A node is created by add_node (called internally by add_leaf, map, fold, etc.). The node is placed in the next available slot in the node array. Height is computed from parent heights.
Active
The node participates in stabilization. Its value is recomputed when its inputs change. It may be pushed onto the dirty heap and popped during stabilization.
Stale
When a node is no longer needed (e.g., a window is garbage-collected), it is marked stale. In the current implementation, stale nodes are not removed from the array – they remain in place but are not referenced by any active computation path.
This is a deliberate trade-off: removing nodes from the middle of the array would require reindexing all dependent references, which is expensive. Instead, stale nodes are left in place and their slots are effectively leaked until the graph is reconstructed (which happens on recovery from checkpoint).
Collected
Stale nodes are reclaimed when:
- The worker shuts down (all memory freed)
- The graph is reconstructed from a checkpoint (new array, stale nodes not restored)
Memory Implications
For long-running workers without crashes, stale node accumulation is bounded by:
- Window GC: each expired window generates ~3 stale nodes (leaf + map + fold contribution)
- With 60-second tumbling windows: ~3 nodes become stale per window per symbol every 60 seconds
- At 2,000 symbols: ~6,000 stale nodes per minute = ~1.2 MB/minute
The checkpoint interval (10 seconds) limits how long stale nodes accumulate. On checkpoint restore, only active nodes are reconstructed.
For deployments where stale accumulation is a concern, the recommended mitigation is periodic checkpoint-and-restart: take a checkpoint, restart the worker, and restore from the checkpoint. This compacts the graph and eliminates all stale nodes.
Value Storage
Node values are stored as Obj.t (type-erased) internally, with typed accessors at the API boundary:
(* Internal storage: untyped *)
node.value <- Obj.repr (my_float_value : float)
(* API boundary: typed *)
let watch (type a) g (incr : a incr) : a =
let node = g.nodes.(incr.incr_node_id) in
(Obj.obj node.value : a)
This allows heterogeneous node values in a single array without boxing each value in a variant. The type safety is maintained by the 'a var and 'a incr phantom types – you cannot read a float node as an int because the type parameter prevents it at compile time.
Float values are stored as unboxed doubles by the OCaml runtime, meaning Obj.repr 42.0 does not allocate – it produces a tagged pointer directly. Integer values smaller than max_int / 2 are similarly unboxed.