Effect Injection

Ripple requires deterministic replay for crash recovery: given the same input sequence from a checkpoint, the system must produce exactly the same output. This means all non-determinism must be injectable through a single interface. This page explains the EFFECT module type, its two implementations, and why this design is mandatory.

The Problem

Consider a node that timestamps its output:

(* BAD: direct call to Time_ns.now *)
let f input =
  { data = compute input; timestamp = Time_ns.now () }

This function is non-deterministic. During replay after a crash, Time_ns.now() returns a different value than during the original computation. The replayed state diverges from the original, violating the determinism invariant.

The same problem applies to:

Random number generation (used for sampling, jitter)
Network I/O (reading from sockets)
File I/O (reading configuration)
System calls (getpid, hostname)

The EFFECT Module Type

All non-determinism flows through a single module signature:

module type S = sig
  val now : unit -> Time_ns.t
  val random_int : int -> int
end

Every component that needs time or randomness takes now or random_int as a parameter rather than calling Time_ns.now or Random.int directly.

The graph engine is parameterized by now:

let create ~now =
  { nodes = Array.create ~len:1024 (Obj.magic ())
  ; node_count = 0
  ; dirty_heap = Dirty_heap.create ~capacity:1024
  ; ...
  ; now    (* injected, not called directly *)
  }

The tracing system is parameterized by random_int:

let create_root ~random_int =
  { trace_id = gen_trace_id ~random_int
  ; span_id = gen_span_id ~random_int
  ; ...
  }

Two Implementations

Live: Production

module Live : S = struct
  let now = Time_ns.now
  let random_int = Random.int
end

Used by the worker binary and CLI. Provides real wall-clock time and pseudorandom numbers.

Test: Deterministic Simulation

module Test : sig
  include S
  val advance_time : Time_ns.Span.t -> unit
  val set_time : Time_ns.t -> unit
  val seed_random : int -> unit
end = struct
  let current_time = ref Time_ns.epoch
  let rng = ref (Random.State.make [| 42 |])

  let now () = !current_time
  let random_int bound = Random.State.int !rng bound

  let advance_time span =
    current_time := Time_ns.add !current_time span
  let set_time t = current_time := t
  let seed_random seed = rng := Random.State.make [| seed |]
end

Used by all tests, benchmarks, and the deterministic simulation harness. Time only advances when explicitly stepped. Random sequences are reproducible from a seed.

Usage Patterns

In Tests

let%expect_test "stabilization timing is deterministic" =
  Test.set_time Time_ns.epoch;
  let g = Graph.create ~now:Test.now in
  (* ... build graph ... *)
  Test.advance_time (Time_ns.Span.of_ms 1.0);
  let _ = Graph.stabilize g in
  (* last_stabilization_ns is exactly 1_000_000 ns *)

In the Worker Binary

let run ~worker_id ~partition_id =
  let now = Time_ns.now in  (* Live time *)
  let worker = Worker.create ~worker_id ~partition_id ~now in
  ...

In Deterministic Simulation

let simulate ~seed ~ticks =
  Test.seed_random seed;
  Test.set_time Time_ns.epoch;
  let g = Graph.create ~now:Test.now in
  for _ = 1 to ticks do
    Test.advance_time (Time_ns.Span.of_us 100.0);
    (* inject events, stabilize, check invariants *)
  done

Why This Design Is Mandatory

Deterministic Replay

The checkpoint-and-replay recovery protocol depends on determinism:

1. Load checkpoint (leaf values + input offsets)
2. Rebuild graph structure
3. Restore leaf values from checkpoint
4. Replay input log from checkpoint's offset
5. Result: same graph state as before crash

Step 5 only holds if every computation produces the same result given the same inputs. If any node calls Time_ns.now() directly, the replayed state diverges at that node and all its descendants.

Simulation Testing

The deterministic simulation harness (inspired by TigerBeetle’s approach) runs millions of simulated operations with injected failures. Each simulation is parameterized by a seed. When a bug is found, the seed reproduces the exact failure sequence.

This is impossible if the system has any direct sources of non-determinism.

The Rule

From the codebase:

“No module in Ripple may call Time_ns.now(), Random.int, or perform direct I/O. All such operations go through the EFFECT interface. Violations break deterministic replay and are considered bugs.”

This rule is enforced by code review, not by the type system (OCaml does not have an effect system that prevents calling Time_ns.now). A future direction would be to use OCaml 5 effects to enforce this at the type level.

Extending the Interface

When adding new sources of non-determinism, extend the S signature:

module type S = sig
  val now : unit -> Time_ns.t
  val random_int : int -> int
  (* Future additions: *)
  (* val hostname : unit -> string *)
  (* val getpid : unit -> int *)
end

Both Live and Test must be updated. The Test implementation must return deterministic values controllable by the test harness.

Keyboard shortcuts

Ripple