Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Running the VWAP Demo

The VWAP (Volume-Weighted Average Price) demo is a complete end-to-end pipeline that reads trade events, partitions by symbol, computes incremental VWAP with tumbling windows, and writes results to stdout. It is the proof that Ripple’s core claim holds: a typed, incremental, distributed pipeline defined in ~50 lines of OCaml that processes 100K+ events/sec.

Quick Start

# Generate and process 100K synthetic trades
make demo

# Or directly:
dune exec ripple-vwap-demo -- --synthetic 100000

Input Modes

The demo binary accepts three input modes:

Synthetic Mode (default)

Generates N trade events across 100 symbols with deterministic prices and sizes:

ripple-vwap-demo --synthetic 100000

Each synthetic event is spaced 1ms apart in event time. Symbols are named SYM0000 through SYM0099. Prices range from 100.0 to 110.0 with 0.1 increments.

File Mode

Reads trade events from a CSV file:

ripple-vwap-demo --file trades.csv

Stdin Mode

Reads trade events from standard input, one per line:

echo "AAPL,150.0,100,1000000000,XNAS" | ripple-vwap-demo --stdin

Or pipe from another process:

cat trades.csv | ripple-vwap-demo --stdin

CSV Format

Input records are comma-separated with 5 fields, no header:

symbol,price,size,timestamp_ns,venue
FieldTypeDescription
symbolstringTicker symbol (e.g., AAPL, SYM0042)
pricefloatTrade price
sizeintNumber of shares
timestamp_nsint64Nanoseconds since epoch (event time)
venuestringExchange venue (e.g., XNAS, XNYS)

Example input:

AAPL,150.00,100,1709000000000000000,XNAS
AAPL,150.25,200,1709000001000000000,XNAS
GOOG,175.50,50,1709000001500000000,XNYS
AAPL,150.10,150,1709000002000000000,XNAS

Lines starting with # are treated as comments and skipped. Empty lines are also skipped.

Output Format

Output is CSV written to stdout with 4 fields:

symbol,vwap,total_volume,trade_count

Example output:

AAPL,150.1222,450,3
GOOG,175.5000,50,1

Diagnostic information is written to stderr, so you can separate data from metadata:

ripple-vwap-demo --synthetic 10000 > output.csv 2> stats.txt

Pipeline Architecture

The demo constructs the following computation graph:

  [leaf: SYM0000 state]  [leaf: SYM0001 state]  ...  [leaf: SYM0099 state]
          |                       |                          |
          v                       v                          v
  [map: vwap_price]       [map: vwap_price]       ...  [map: vwap_price]
          |                       |                          |
          +----------+------------+--------------------------+
                     |
                     v
            [incr_fold: portfolio_total]   <-- O(1) per update
                     |
                     v
                 [output]

Each symbol has a leaf node holding a vwap_state record:

type vwap_state =
  { total_value : float    (* price * size cumulative *)
  ; total_volume : int     (* shares cumulative *)
  ; trade_count : int
  ; last_price : float
  }

When a trade arrives for symbol X:

  1. The leaf for X is updated with the new cumulative state
  2. stabilize fires, recomputing only X’s map node and the incr_fold
  3. The portfolio total updates in O(1) via remove(old_vwap) + add(new_vwap)

This means processing a single trade touches exactly 3 nodes regardless of how many symbols exist in the graph.

Processing Flow

Events are processed in batches of 1,000:

for each batch of 1000 events:
    for each event in batch:
        look up the symbol's leaf
        update vwap_state with new trade
        set_leaf with updated state
        advance watermark
    stabilize graph (once per batch)
    emit changed symbol VWAPs to stdout

Batching amortizes the stabilization cost. With 1,000-event batches and 100 symbols, a typical stabilization touches ~100 map nodes + 1 fold node.

Expected Output Statistics

When running --synthetic 100000:

--- Ripple VWAP Demo Results ---
Events processed:  100000
Symbols:           100
Graph nodes:       301
Stabilizations:    100
Elapsed:           0.XXXs
Throughput:        XXXXX events/sec
Watermark:         100999000000 ns
Portfolio total:   XXXX.XX
Output records:    XXXXX
  • Graph nodes = 301: 100 leaves + 100 map nodes + 1 incr_fold node = 201 compute nodes. The remaining 100 are the var_to_incr adapter nodes.
  • Stabilizations = 100: 100,000 events / 1,000 batch size = 100 stabilization cycles.
  • Throughput: typically 200K-500K events/sec on modern hardware, well above the 100K target.

Running with Real Market Data

To feed real trade data, convert your market data feed to the CSV format above. If you have TAQ data or FIX logs:

# Convert FIX log to Ripple CSV format
your-fix-converter --output trades.csv

# Process
ripple-vwap-demo --file trades.csv > vwap_results.csv

The only requirement is that timestamp_ns values be monotonically non-decreasing per symbol for correct watermark tracking. Out-of-order events are processed but classified as late by the watermark tracker.