Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Delta Algebra

Deltas are the fundamental unit of change in Ripple. Rather than transmitting complete values on every update, Ripple computes and transmits the minimal difference between the old and new value. This page covers the delta type, its algebraic laws, and why those laws matter for effectively-once processing.

The Delta Type

type patch =
  | Field_set of { field_name : string; value : Sexp.t }
  | Field_remove of { field_name : string }
  | Map_set of { key : Sexp.t; value : Sexp.t }
  | Map_remove of { key : Sexp.t }

type 'a t =
  | Set of 'a           (* Replace the entire value *)
  | Patch of patch list  (* Update specific fields *)
  | Remove               (* Delete the keyed entry *)

Set

Replaces the entire value. Used for initial sync, small records (below the patch threshold), or when most fields have changed.

Patch

A list of field-level updates. Each patch operation is absolute (not relative): Field_set { field_name = "price"; value = Atom "150.0" } sets the price to 150.0 regardless of what the previous price was. This is what makes patches idempotent.

Remove

Deletes the keyed entry. Used when a symbol is removed from the partition or a window is garbage-collected.

Patch.t: Absolute Field-Level Updates

A critical design decision: patches are absolute, not relative. Field_set { field_name = "price"; value = 150.0 } means “set price to 150.0”, not “add 10.0 to price”. This distinction is essential:

  • Absolute patches are idempotent: applying the same patch twice produces the same result.
  • Relative patches are not: adding 10.0 twice gives a different result than adding it once.

For effectively-once semantics (at-least-once delivery + idempotent processing), idempotency of patches is non-negotiable.

The Six Algebraic Laws

The compose function combines two sequential deltas into one. It satisfies six algebraic laws:

Law 1: Associativity

compose(a, compose(b, c)) = compose(compose(a, b), c)

Composition order does not matter when grouping. This allows the transport layer to batch and merge deltas without worrying about evaluation order.

Law 2: Right Identity (Set)

compose(d, Set(v)) = Set(v)

Any delta followed by a Set is equivalent to just the Set. The Set completely replaces whatever came before. This is because Set is an absolute operation – it establishes a new baseline regardless of history.

Law 3: Right Annihilation (Remove)

compose(d, Remove) = Remove

Any delta followed by Remove is equivalent to just Remove. If the value is going to be deleted, it does not matter what preceded it.

Law 4: Patch Merge (Last Writer Wins)

compose(Patch(p1), Patch(p2)) = Patch(merge(p1, p2))

When composing two patch lists, fields in p2 override fields in p1 with the same name. Fields present only in p1 or only in p2 are preserved.

(* Example: composing two patches *)
let p1 = Patch [Field_set { field_name = "price"; value = Atom "100" }]
let p2 = Patch [Field_set { field_name = "price"; value = Atom "200" }
               ;Field_set { field_name = "size";  value = Atom "50"  }]

compose p1 p2
(* = Patch [Field_set "price" "200"; Field_set "size" "50"] *)
(* p2's price wins; p2's size is added *)

Law 5: Compatibility

apply(compose(d1, d2), v) = apply(d2, apply(d1, v))

Composing two deltas and applying the result is equivalent to applying them sequentially. This is the law that makes delta compression safe: the transport layer can compose multiple deltas into one without changing the outcome.

Law 6: Idempotency of Apply

apply(d, apply(d, v)) = apply(d, v)

Applying a delta twice produces the same result as applying it once. This is the foundation of effectively-once semantics. If a delta is delivered twice (at-least-once delivery), the second application is a no-op.

Why Idempotency Matters

Ripple provides effectively-once semantics: at-least-once delivery combined with idempotent processing. The guarantee chain:

1. Kafka provides at-least-once delivery (consumer may re-read after crash)
2. Deltas are idempotent (apply(d, apply(d, v)) = apply(d, v))
3. Remote nodes track sequence numbers (duplicate detection)
4. Result: each event's effect is applied exactly once

    Effectively-once = at-least-once delivery
                     + idempotent processing
                     + exactly-once state

Without idempotent deltas, re-delivery after a crash would corrupt the computation. With idempotent deltas, re-delivery is harmless.

The Diff Algorithm

The diff function computes the minimal delta between two values:

let diff ~sexp_of ~equal ~patch_threshold ~old ~new_ =
  if equal old new_ then
    Patch []  (* No change *)
  else
    let old_sexp = sexp_of old in
    let new_sexp = sexp_of new_ in
    match old_sexp, new_sexp with
    | List old_fields, List new_fields ->
      if List.length new_fields <= patch_threshold then
        Set new_  (* Small record: Set is cheaper than Patch *)
      else
        (* Compare field by field, emit only changed fields *)
        Patch (changed_fields old_fields new_fields)
    | _ ->
      Set new_  (* Non-record: full replacement *)

The patch_threshold parameter controls when to use Patch vs Set. For a 3-field record, the overhead of field-level diffing exceeds the savings – just send the whole value. For a 20-field record where 1 field changed, Patch sends ~90% less data.

Benchmark B-03 measured an 8.9x byte reduction using Patch vs Set on a 20-field record with 2 fields changed. This directly translates to less network bandwidth and lower serialization cost on the wire.

The Roundtrip Property

The most important property for correctness:

apply(diff(old, new), old) = Ok new

For any two values, computing the diff and applying it to the old value must produce the new value. This is tested as an inline expect test in delta.ml and should be verified by property-based tests for any new type added to the system.

Application to the Wire Protocol

On the wire, deltas are serialized as s-expressions for debugging or as bin_prot for performance:

Delta message (wire):
  source_output: "vwap"
  key: "AAPL"
  delta_bytes: <bin_prot encoded Delta.t>
  sequence_no: 42
  watermark_ns: 1709000000000000000

The receiver applies the delta to its local remote node, which triggers incremental recomputation in the downstream graph. The sequence number ensures idempotent delivery – if sequence 42 has already been applied, the delta is silently discarded.