Docker & Kubernetes
This page covers the deployment infrastructure for Ripple: the Docker image, docker compose development stack, and Kubernetes production manifests.
Dockerfile
Ripple uses a multi-stage build for minimal production images:
# Stage 1: Build
FROM ocaml/opam:ubuntu-22.04-ocaml-5.3 AS builder
RUN sudo apt-get update && sudo apt-get install -y \
librdkafka-dev \
libssl-dev \
pkg-config \
&& sudo rm -rf /var/lib/apt/lists/*
WORKDIR /home/opam/ripple
COPY --chown=opam:opam ripple.opam dune-project ./
RUN opam install . --deps-only --yes
COPY --chown=opam:opam . .
RUN eval $(opam env) && dune build bin/worker/main.exe
# Stage 2: Runtime
FROM ubuntu:22.04 AS runtime
RUN apt-get update && apt-get install -y \
librdkafka1 \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder \
/home/opam/ripple/_build/default/bin/worker/main.exe \
/usr/local/bin/ripple-worker
EXPOSE 9100 # Health
EXPOSE 9101 # RPC
EXPOSE 9102 # Metrics
ENTRYPOINT ["/usr/local/bin/ripple-worker"]
The opam install . --deps-only layer is cached separately from the source copy, so dependency changes rebuild the dependency layer but source-only changes skip it.
Build the image:
docker build -f infra/docker/Dockerfile.worker -t ripple/worker:latest .
Docker Compose (Development)
The development stack runs Redpanda (Kafka-compatible) and MinIO (S3-compatible) alongside Ripple workers:
services:
# Redpanda: Kafka-compatible broker, no ZooKeeper
redpanda:
image: docker.redpanda.com/redpandadata/redpanda:v24.1.1
command:
- redpanda start
- --smp 1
- --memory 512M
- --overprovisioned
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
- --advertise-kafka-addr internal://redpanda:9092,external://localhost:19092
ports:
- "19092:19092" # Kafka API
- "18082:18082" # HTTP Proxy
healthcheck:
test: ["CMD", "rpk", "cluster", "health"]
interval: 5s
# MinIO: S3-compatible checkpoint storage
minio:
image: minio/minio:latest
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER: ripple
MINIO_ROOT_PASSWORD: ripplepass
ports:
- "9000:9000" # S3 API
- "9001:9001" # Web console
volumes:
- minio-data:/data
# Init job: create Kafka topics
create-infra:
image: docker.redpanda.com/redpandadata/redpanda:v24.1.1
depends_on:
redpanda: { condition: service_healthy }
entrypoint: >
bash -c "
rpk topic create trades --brokers redpanda:9092 --partitions 8 &&
rpk topic create vwap-output --brokers redpanda:9092 --partitions 8
"
# Init job: create S3 bucket
create-bucket:
image: minio/mc:latest
depends_on:
minio: { condition: service_healthy }
entrypoint: >
bash -c "
mc alias set local http://minio:9000 ripple ripplepass &&
mc mb local/ripple-checkpoints --ignore-existing
"
volumes:
minio-data:
Usage
cd infra/compose
# Start infrastructure
docker compose up -d
# Wait for health checks
docker compose ps
# Run integration test
./run-integration-test.sh
# Teardown
docker compose down -v
Accessing Services
| Service | URL | Purpose |
|---|---|---|
| Kafka API | localhost:19092 | Produce/consume trades |
| MinIO Console | http://localhost:9001 | Browse checkpoint bucket |
| MinIO S3 API | http://localhost:9000 | S3-compatible endpoint |
Kubernetes (Production)
Namespace
apiVersion: v1
kind: Namespace
metadata:
name: ripple
labels:
app: ripple
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: ripple-config
namespace: ripple
data:
checkpoint_bucket: "s3://ripple-checkpoints/prod"
kafka_brokers: "kafka-0.kafka.svc:9092,kafka-1.kafka.svc:9092"
ripple.sexp: |
((cluster
((name prod)
(defaults
((num_partitions 128)
(max_keys_per_partition 2000)
(checkpoint_interval_sec 10)
(heartbeat_interval_sec 5)
(failure_detection_timeout_sec 30))))))
Worker StatefulSet
Workers use a StatefulSet (not Deployment) because they need:
- Stable network identity for partition assignment
- Stable storage for local checkpoint cache
- Ordered, graceful scaling
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ripple-worker
namespace: ripple
spec:
serviceName: ripple-worker
replicas: 10
podManagementPolicy: Parallel
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9102"
spec:
terminationGracePeriodSeconds: 30
containers:
- name: worker
image: ripple/worker:latest
ports:
- { name: health, containerPort: 9100 }
- { name: rpc, containerPort: 9101 }
- { name: metrics, containerPort: 9102 }
env:
- name: RIPPLE_WORKER_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
requests: { cpu: "1", memory: "512Mi" }
limits: { cpu: "2", memory: "1Gi" }
livenessProbe:
httpGet: { path: /health, port: health }
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: health }
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: checkpoint-cache
mountPath: /var/lib/ripple/checkpoints
volumeClaimTemplates:
- metadata:
name: checkpoint-cache
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Key configuration:
podManagementPolicy: Parallel– workers start simultaneously, no sequential ordering neededterminationGracePeriodSeconds: 30– allows time for drain + checkpoint on shutdown- Worker ID is derived from the pod name (
ripple-worker-0,ripple-worker-1, etc.) - Local checkpoint cache on PVC for fast recovery without S3 round-trip
Coordinator Deployment
The coordinator is stateless, so it uses a Deployment (not StatefulSet):
apiVersion: apps/v1
kind: Deployment
metadata:
name: ripple-coordinator
namespace: ripple
spec:
replicas: 2 # HA -- active/standby
template:
spec:
containers:
- name: coordinator
image: ripple/coordinator:latest
ports:
- { name: grpc, containerPort: 9200 }
- { name: health, containerPort: 9201 }
- { name: metrics, containerPort: 9202 }
resources:
requests: { cpu: "500m", memory: "256Mi" }
limits: { cpu: "1", memory: "512Mi" }
Headless Service
apiVersion: v1
kind: Service
metadata:
name: ripple-worker
namespace: ripple
spec:
clusterIP: None # Headless for StatefulSet DNS
selector:
app: ripple
component: worker
ports:
- { name: rpc, port: 9101 }
- { name: metrics, port: 9102 }
The headless service gives each worker a stable DNS name: ripple-worker-0.ripple-worker.ripple.svc.cluster.local.
Scaling
Horizontal Scaling
# Scale workers
kubectl scale statefulset ripple-worker --replicas=20 -n ripple
The coordinator detects new workers via heartbeat registration and rebalances partitions automatically via the consistent hash ring.
Resource Guidelines
| Component | CPU Request | Memory Request | Rationale |
|---|---|---|---|
| Worker | 1 core | 512 Mi | Graph engine is CPU-bound, 500KB working set |
| Coordinator | 500m | 256 Mi | Lightweight, mostly heartbeat tracking |
Workers are CPU-bound (stabilization loop). Memory usage is predictable: ~200 bytes/node * 4,001 nodes = ~800 KB for the graph, plus input buffers and GC overhead.