Parallax
"Depth perception for your attack surface."
Parallax is an open-source, Rust-native typed property graph engine built for cyber asset intelligence. It is the infrastructure layer that answers one question:
Given everything we know about our assets, what is connected to what — and what does that imply?
What It Is
Parallax is to security asset data what SQLite is to relational data: embeddable, zero-external-dependency, and correct enough to trust.
It solves the problem that every security team faces: ~400,000 assets spread across ~70 tools, each tool seeing its own slice with no visibility into the relationships between them. A graph model — entities as nodes, relationships as directed edges — is the right abstraction. Parallax makes that graph infrastructure open, embeddable, and fast.
What It Is Not
- Not a scanner. It consumes telemetry; it does not generate it.
- Not a SIEM. It does not process event streams or alert on logs.
- Not a UI product. It is a library and a server; UIs are built on top.
- Not Neo4j. It is a domain-specific graph engine, not a general-purpose graph database.
The Stack
parallax-cli ← Command-line interface
parallax-server ← REST HTTP server (Axum)
parallax-connect ← Integration SDK (Connector trait + scheduler)
parallax-ingest ← Sync protocol (diff + atomic commit)
parallax-policy ← Policy evaluation engine
parallax-query ← PQL parser, planner, executor
parallax-graph ← Graph traversal, path-finding, blast radius
parallax-store ← Storage engine (WAL + MemTable + Segments + MVCC)
parallax-core ← Shared types (Entity, Relationship, Value, Timestamp)
Dependencies flow strictly downward. No cycles. The compiler enforces this.
Version
This documentation covers Parallax v0.1 — the first working vertical slice:
- Ingest via REST API or connector SDK
- Durable storage with WAL + MemTable + Segments
- Graph traversal, shortest path, blast radius, coverage gap analysis
- PQL query language (FIND / WITH / THAT / RETURN / LIMIT)
- Policy evaluation with posture scoring
- REST API with authentication and Prometheus metrics
- CLI for query, stats, and serve
Quick Start
# Build
cargo build --release --package parallax-cli
# Start the server (no auth key = open mode)
./target/release/parallax serve --data-dir /var/lib/parallax
# Ingest some entities
curl -X POST http://localhost:7700/v1/ingest/sync \
-H 'Content-Type: application/json' \
-d '{
"connector_id": "my-connector",
"sync_id": "sync-001",
"entities": [
{"entity_type": "host", "entity_key": "web-01", "entity_class": "Host",
"display_name": "Web Server 01", "properties": {"state": "running"}},
{"entity_type": "service", "entity_key": "nginx", "entity_class": "Service",
"display_name": "Nginx"}
],
"relationships": [
{"from_type": "host", "from_key": "web-01", "verb": "RUNS",
"to_type": "service", "to_key": "nginx"}
]
}'
# Query with PQL
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND host WITH state = '\''running'\''"}'
Reading This Book
Start with Design Principles to understand the "why" behind every decision. Then read Data Model — everything else depends on it.
After that, read in whatever order matches your interest:
- Building connectors → Connector SDK
- Querying the graph → PQL Reference
- Storage internals → Storage Engine
- REST API usage → API Reference
Architecture Overview
Parallax is organized as a Cargo workspace of nine crates arranged in a strictly acyclic dependency chain. Each crate has a single responsibility, a minimal public API, and explicit dependencies.
The Nine Crates
| Crate | Responsibility |
|---|---|
parallax-core | Shared types and error definitions. Zero external deps beyond serde and blake3. |
parallax-store | Durable storage: WAL, MemTable, immutable Segments, MVCC snapshots. |
parallax-graph | Graph reasoning: traversal, pattern matching, shortest path, blast radius. |
parallax-query | PQL parse → plan → execute pipeline. |
parallax-policy | Policy rule evaluation and posture scoring. |
parallax-ingest | Source-scoped sync protocol: diff, validate, atomic commit. |
parallax-connect | Integration SDK: Connector trait, step scheduler, entity/relationship builders. |
parallax-server | REST HTTP server with authentication, request-ID middleware, Prometheus metrics. |
parallax-cli | Command-line binary: serve, query, stats, version. |
Dependency Graph
parallax-core
│
├──► parallax-store
│ │
│ ├──► parallax-graph
│ │ │
│ │ ├──► parallax-query
│ │ └──► parallax-policy
│ │
│ └──► parallax-ingest
│ │
│ └──► parallax-connect
│
parallax-server (depends on: core, store, graph, query, policy, ingest, connect)
│
└──► parallax-cli
The rule is absolute: No crate may depend on a crate above it in this graph. The Rust module system enforces this at compile time.
Data Flow
A typical request through the full stack:
External Source
│
▼
[REST POST /v1/ingest/sync] ← parallax-server validates auth, parses JSON
│
▼
[validate_sync_batch()] ← parallax-ingest checks referential integrity
│ and class/verb constraints
▼
[SyncEngine::commit_sync()] ← parallax-ingest diffs against current snapshot,
│ builds WriteBatch
▼
[StorageEngine::write(batch)] ← parallax-store appends WAL entry, updates
│ MemTable, publishes new snapshot
▼
[Snapshot published] ← All subsequent reads see new state
...later...
[REST POST /v1/query] ← parallax-server
│
▼
[parse(pql)] ← parallax-query lexer + recursive descent parser
│
▼
[plan(ast, &stats)] ← parallax-query planner chooses index strategy
│
▼
[execute(plan, &snapshot)] ← parallax-query executor calls parallax-graph
│
▼
[GraphReader::find() / traverse()] ← parallax-graph reads from MVCC snapshot
│
▼
[JSON response]
Key Architectural Decisions
Single-Writer, Multi-Reader
All mutations flow through one serialized write path. This eliminates write-write conflicts, deadlocks, and lock ordering bugs. Readers operate on MVCC snapshots with zero coordination — a reader never blocks a writer.
Owned Storage Engine
No external storage dependency (no Neo4j, no RocksDB, no sled). Parallax owns its storage format, which means:
- Embeddable in a CLI without a separate daemon
- Deterministic latency without JVM GC pauses
- Full control over the read/write path
Deterministic Entity Identity
Entity IDs are 128-bit blake3 hashes of (account_id, entity_type, entity_key).
The same logical entity always gets the same ID, across all time and all
connectors. This enables idempotent re-ingestion and conflict-free merging.
Hybrid Logical Clocks
Even in single-node mode, Parallax uses HLC timestamps rather than wall clocks. Wall clocks can go backwards (NTP adjustments); two events in the same millisecond need deterministic ordering; and HLC extends naturally when clustering is added.
Design Principles
Parallax's architecture is shaped by seven principles drawn from the systems thinkers whose work directly influenced each layer.
Principle 1: Interfaces Are Forever
Lampson: "Implementation can change; interfaces cannot."
The entity/relationship schema, the query language (PQL), and the integration SDK are public contracts. They are versioned, documented, and maintained with the assumption that downstream consumers depend on them.
Internal storage format, memory layout, concurrency strategy — these are implementation secrets that can change every release without notice.
What this means in practice:
EntityId,Entity,Relationship,Value, andPropertyMapinparallax-coreare stable API. Breaking changes require a major version bump.- The WAL on-disk format (
PXWAmagic + postcard framing) can change between releases as long as the storage engine handles migration transparently. - PQL syntax is a public contract. Adding new keywords is allowed; removing or changing semantics of existing keywords requires a deprecation cycle.
Principle 2: Ownership Is Architecture
Matsakis: "Good abstractions hide complexity."
Every datum in Parallax has exactly one owner at any point in time:
- The graph store owns entities and relationships.
- A transaction borrows the store mutably for writes; a snapshot borrows it immutably for reads.
- A snapshot is a frozen, immutable view that readers hold cheaply via
Arc::clone. - A connector owns its ingestion state. The engine never reaches into it.
If you cannot draw the ownership graph of a component on a whiteboard, the component is too complex.
In code: GraphReader<'snap> ties every returned reference to the
snapshot's lifetime via Rust's borrow checker. No use-after-free. No stale
reads. No cloning on the read path.
Principle 3: Single Writer, Many Readers
Bos: "Less sharing means fewer bugs."
All mutations flow through a single serialized write path. This eliminates write-write conflicts, deadlocks, and lock ordering bugs.
Readers operate on MVCC snapshots with zero coordination. A reader never blocks a writer. A writer never waits for a reader.
The numbers: A single writer on modern NVMe processes 500K+ key-value mutations per second after WAL batching. The largest enterprise asset graphs ingest at ~10K entities/sec. We have 50× headroom.
Principle 4: Separate Normal and Worst Case
Lampson: "Optimize for the common case; handle edge cases separately."
Normal case: Ingest a batch of 50–500 entity upserts from a connector. Execute a graph query over 10K–100K entities. Evaluate a policy rule. This path is optimized for latency and throughput.
Worst case: Full re-sync of 2M assets from a new connector. WAL recovery after crash. Compaction of stale segment files. These paths are optimized for correctness and progress, not speed.
The normal and worst case paths may share zero code. That is acceptable.
Principle 5: Correctness Is Non-Negotiable
Lamport: "If you can't specify it precisely, you don't understand it."
Every state machine — transaction lifecycle, sync protocol, compaction — has written invariants before implementation. See the Invariant Reference for the full list.
Safety properties (must never happen):
- A committed write is never lost.
- A read snapshot never observes a partial write.
- An entity ID is never reused for a different logical entity.
- A relationship never references a non-existent entity in a committed state.
Liveness properties (must eventually happen):
- A submitted write batch is eventually committed or rejected.
- A compaction cycle eventually reclaims space from deleted versions.
- A connector sync eventually converges to the source-of-truth state.
Principle 6: Make Illegal States Unrepresentable
Turon: "The best API is one where the obvious thing to do is the right thing."
If an entity cannot exist without a _type and _class, the Rust type
system enforces that — not a runtime validator.
If a relationship requires two valid entity references, the write path enforces referential integrity at commit time (INV-03, INV-04).
If a query cursor cannot outlive its snapshot, the borrow checker enforces it.
In practice:
EntityClass::new(s)returnsResult— unknown classes are rejected at the API boundary, not silently accepted.EntityId::derive()takes(account_id, entity_type, entity_key)and produces a deterministic ID. There is noEntityId::random().WriteBatchis an opaque builder — you cannot construct a batch that references a non-existent entity. Validation runs at ingest time.
Principle 7: Observability as a Requirement
Not optional. Not "later." Now.
Every subsystem exposes:
- Metrics: operation counts, latencies (p50/p99/max), queue depths —
served at
GET /metricsin Prometheus text format. - Structured logs: every write batch commit, every sync cycle, every
error — via
tracingwith configurable log levels. - Request IDs: every HTTP request gets a
X-Request-Idheader (UUID v4) that is propagated through the stack for distributed tracing.
The engine must be debuggable in production without recompilation.
Crate Dependency Map
Workspace Layout
parallax/
├── Cargo.toml # Workspace root
├── crates/
│ ├── parallax-core/ # Shared types, zero external deps
│ ├── parallax-store/ # Storage engine (WAL + MemTable + Segments)
│ ├── parallax-graph/ # Graph traversal and reasoning
│ ├── parallax-query/ # PQL parse + plan + execute
│ ├── parallax-policy/ # Policy rule evaluation
│ ├── parallax-ingest/ # Sync protocol (diff + commit)
│ ├── parallax-connect/ # Integration SDK
│ ├── parallax-server/ # REST HTTP server
│ └── parallax-cli/ # CLI binary
└── specs/ # Architectural specifications
Dependency Graph
parallax-core (no workspace deps; blake3, serde, compact_str, thiserror)
│
├──► parallax-store (core; arc-swap, crc32c, lz4_flex, postcard, tracing)
│ │
│ ├──► parallax-graph (core, store)
│ │ │
│ │ ├──► parallax-query (core, graph)
│ │ └──► parallax-policy (core, graph)
│ │
│ └──► parallax-ingest (core, store)
│ │
│ └──► parallax-connect (core, ingest)
│
parallax-server (core, store, graph, query, policy, ingest, connect;
│ axum, tower, tower-http, tokio, serde_json, uuid, compact_str)
│
parallax-cli (server; clap, anyhow, tokio)
External Dependencies by Crate
parallax-core
| Crate | Version | Purpose |
|---|---|---|
serde | 1 | Serialization derive macros |
compact_str | 0.8 | Small-string optimization (entity types/classes are short) |
blake3 | 1 | Deterministic ID hashing (SIMD-accelerated) |
thiserror | 1 | Error derive macros |
parallax-store
| Crate | Version | Purpose |
|---|---|---|
arc-swap | 1 | Lock-free atomic Arc swap for snapshot publishing |
crc32c | 0.6 | WAL entry checksums |
lz4_flex | 0.11 | Block compression for segment files |
postcard | 1 | Compact binary serialization for WAL + Segments |
tracing | 0.1 | Structured logging |
tempfile | 3 | (dev) Temporary directories for tests |
parallax-graph
| Crate | Purpose |
|---|---|
tracing | Structured logging |
parallax-query
| Crate | Purpose |
|---|---|
tracing | Structured logging |
parallax-policy
| Crate | Purpose |
|---|---|
serde | Policy rule serialization |
tracing | Structured logging |
parallax-ingest
| Crate | Purpose |
|---|---|
tracing | Structured logging |
tempfile | (dev) Temporary directories for tests |
parallax-connect
| Crate | Purpose |
|---|---|
async-trait | Async trait support for the Connector trait |
serde | Builder serialization |
tokio | Async runtime |
tracing | Structured logging |
compact_str | Small-string optimization |
thiserror | Error types |
parallax-server
| Crate | Purpose |
|---|---|
axum | HTTP/REST server framework |
tower | Middleware stack |
tower-http | HTTP middleware (trace, request-id, sensitive-headers) |
tokio | Async runtime |
serde_json | JSON serialization for REST responses |
uuid | UUID v4 for request IDs |
compact_str | Small-string optimization |
tracing | Structured logging |
thiserror | Error types |
parallax-cli
| Crate | Purpose |
|---|---|
clap | Command-line argument parsing |
anyhow | Error handling in binary context |
tokio | Async runtime |
serde_json | JSON output formatting |
What We Explicitly Do Not Use
| Crate | Why Not |
|---|---|
rocksdb | We own storage. No C++ FFI dependency. |
diesel / sqlx | No SQL database. |
neo4j-* | The whole point is to not depend on Neo4j. |
sled | Correctness concerns in early versions. |
tonic / prost | gRPC deferred to v0.2; REST-only for v0.1. |
Adding a New External Dependency
Before adding a new crate:
- Is there a
stdequivalent? Preferstd. - Does an existing approved crate already cover this? Reuse it.
- Is the crate well-maintained, zero-unsound-unsafe, and Apache/MIT licensed?
- Does adding it violate the acyclic dep constraint?
Add a justification comment in Cargo.toml for non-obvious dependencies.
Data Model
Parallax models the world as a typed property graph:
- Entities are nodes. Each has a type, a class, and a bag of flat properties.
- Relationships are directed edges. Each has a verb (class), connects two entities, and may carry properties.
- Properties are flat key-value pairs. No nesting. If you need structure, model it as another entity and a relationship.
Entity Identity
Design Goals
An entity ID must be:
- Stable — the same logical entity always gets the same ID.
- Deterministic — given the same inputs, we always produce the same ID.
- Collision-free — two different entities never share an ID.
- Source-independent — the ID doesn't change if we re-ingest from a different connector.
Implementation
#![allow(unused)] fn main() { /// EntityId is a 128-bit blake3 hash of (account_id, entity_type, entity_key). /// It is never randomly generated. #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct EntityId(pub [u8; 16]); impl EntityId { pub fn derive(account_id: &str, entity_type: &str, entity_key: &str) -> Self { let mut hasher = blake3::Hasher::new(); hasher.update(account_id.as_bytes()); hasher.update(b":"); hasher.update(entity_type.as_bytes()); hasher.update(b":"); hasher.update(entity_key.as_bytes()); let hash = hasher.finalize(); let mut id = [0u8; 16]; id.copy_from_slice(&hash.as_bytes()[..16]); EntityId(id) } } }
Why blake3: Fast (SIMD-accelerated), cryptographically strong, deterministic, no seed needed. 128-bit truncation gives collision probability of ~1 in 10^18 at 10 billion entities.
Why not UUID v4: UUIDs are random and not deterministic from source data. We need idempotent re-ingestion — inserting the same entity twice must produce the same ID, not a duplicate.
Entity Structure
#![allow(unused)] fn main() { pub struct Entity { /// Deterministic, content-addressed identity. pub id: EntityId, /// Specific type: "aws_ec2_instance", "okta_user", "github_repo". /// Open set — connectors define new types freely. pub _type: EntityType, /// Broad classification: "Host", "User", "DataStore". /// Closed set of ~40 values. Enables cross-type queries. pub _class: EntityClass, /// Human-readable display name. Not unique, not stable. pub display_name: CompactString, /// Which connector + sync cycle produced this version. pub source: SourceTag, /// When this entity was first observed by Parallax. pub created_at: Timestamp, /// When this entity was last modified. pub updated_at: Timestamp, /// Soft-delete flag. Set by sync diff when entity disappears from source. pub _deleted: bool, /// Flat key-value property bag. pub properties: PropertyMap, } }
Type vs. Class: The Two-Level Hierarchy
Class (broad, ~40 total) Type (specific, unbounded)
───────────────────────── ───────────────────────────────
Host aws_ec2_instance, azure_vm, host
User okta_user, aws_iam_user, user
DataStore aws_s3_bucket, aws_dynamodb_table
CodeRepo github_repo, gitlab_project
Firewall aws_security_group, gcp_firewall_rule
Service service, nginx, microservice
Class enables generic queries across cloud providers:
FIND Host WITH active = true
Type enables specific queries:
FIND aws_ec2_instance WITH instanceType = 'm5.xlarge'
Class is a closed set (defined by Parallax, ~40 values). Type is an open set (defined by connectors, unlimited). See the full class list in Known Entity Classes.
Relationship Structure
#![allow(unused)] fn main() { pub struct Relationship { /// Deterministic identity derived from (from_id, class, to_id). pub id: RelationshipId, /// The verb: HAS, RUNS, CONTAINS, ALLOWS, etc. pub _class: RelationshipClass, /// Source entity. Must exist at commit time. pub from_id: EntityId, /// Target entity. Must exist at commit time. pub to_id: EntityId, /// Optional properties on the relationship edge. pub properties: PropertyMap, pub source: SourceTag, pub _deleted: bool, } }
Referential integrity (INV-03): The from_id and to_id of every committed
relationship must reference entities that exist in the graph. The ingest layer
enforces this at commit time — dangling relationships are rejected, not silently
stored.
Property Values
Properties are flat key-value pairs with a small set of value types:
#![allow(unused)] fn main() { pub enum Value { Null, Bool(bool), Int(i64), Float(f64), String(CompactString), Timestamp(Timestamp), StringArray(Vec<CompactString>), } pub type PropertyMap = BTreeMap<CompactString, Value>; }
Why no nested objects: Flat properties are indexable, queryable, and diff cleanly (key-by-key comparison). JSON path expressions inside a graph query create a second query language inside the first. If your source data has nested objects, either flatten them into properties, or model the structure as entities and relationships.
Source Tracking
Every entity and relationship knows which connector produced it:
#![allow(unused)] fn main() { pub struct SourceTag { /// Connector identifier: "connector-aws", "my-scanner", etc. pub connector_id: CompactString, /// Unique ID for this specific sync execution. pub sync_id: CompactString, /// When this sync started (HLC timestamp). pub sync_timestamp: Timestamp, } }
This enables:
- Differential sync: Delete entities from connector X that weren't seen in the latest sync — without touching entities from connector Y.
- Provenance: Know which connector is the authority for each entity.
Hybrid Logical Clocks
#![allow(unused)] fn main() { pub struct Timestamp { /// Milliseconds since Unix epoch (wall clock component). pub wall_ms: u64, /// Logical counter — breaks ties when wall_ms is equal. pub logical: u32, /// Node identifier (0 for single-node; reserved for clustering). pub node_id: u16, } }
HLC is used instead of wall-clock SystemTime because:
- Wall clocks can go backwards (NTP adjustments).
- Two events in the same millisecond need deterministic ordering.
- HLC extends naturally when clustering is added.
Invariants
INV-01: Every entity has a non-empty _type, _class, and entity_key.
INV-02: EntityId is deterministic: same (account, type, key) → same id.
INV-03: Every relationship's from_id and to_id reference existing entities.
INV-04: No two entities in the same account share (type, key).
INV-05: No two relationships share (from_id, class, to_id) unless
explicitly keyed with derive_with_key.
INV-06: Timestamps are monotonically increasing per node.
INV-07: Property types are stable within an entity type across versions.
Concurrency Model
Parallax uses a single-writer, multi-reader model with MVCC snapshots. This is the most important architectural decision in the codebase.
The Single-Writer Invariant
All mutations to the graph flow through a single serialized write path. At
any point in time, at most one WriteBatch is being committed.
What this eliminates:
- Write-write conflicts
- Deadlocks
- Lock ordering bugs
- Non-deterministic mutation ordering
What this does not prevent:
- Multiple concurrent readers (unlimited, lock-free)
- Multiple concurrent connector syncs (they queue at the ingest layer)
MVCC Snapshots
┌─────────────────────┐
│ Writer Path │
│ (single, serial) │
│ │
WriteBatch ──────►│ 1. Validate │
WriteBatch ──────►│ 2. WAL append+fsync │
WriteBatch ──────►│ 3. Apply to MemTable│
│ 4. Update indices │
│ 5. Publish snapshot │
└──────────┬───────────┘
│ ArcSwap::store (atomic)
┌──────────▼───────────┐
│ SnapshotManager │
│ │
│ current: Arc<Snap> │
└──┬─────┬─────┬───────┘
│ │ │ Arc::clone (one atomic increment)
┌──▼─┐┌──▼─┐┌──▼──┐
│ R1 ││ R2 ││ R3 │ Reader threads (unlimited)
└────┘└────┘└─────┘
A snapshot is an immutable view of the graph at a point in time. Readers
acquire a snapshot with Arc::clone — one atomic increment, no lock. They
hold it as long as needed without blocking writes.
When the writer commits a new batch, it atomically publishes a new snapshot
via arc-swap. Existing readers keep their old snapshot until they drop it.
The old snapshot's memory is freed when all readers release their Arc.
Shared State Inventory
| Shared State | Accessed By | Synchronization |
|---|---|---|
current_snapshot | Writer (store), Readers (load) | arc-swap — lock-free atomic pointer |
| WAL file | Writer only | Single owner, no sync needed |
| MemTable | Writer (mut), Snapshot (immutable ref) | Writer publishes new snapshot; never mutated after publish |
| Segment inventory | Writer (during compaction), Readers (via snapshot) | Snapshots hold Arc<Vec<SegmentRef>>; writer builds new Vec |
| Metrics counters | Writer + Readers | Relaxed atomics (counters only) |
Key insight: The only truly shared mutable state is the snapshot pointer. Everything else is either single-owner (WAL, MemTable before publish) or immutable-after-publish (snapshots, segments).
The SyncEngine Lock Pattern
The REST server shares the storage engine across concurrent HTTP handlers via
Arc<Mutex<StorageEngine>>:
#![allow(unused)] fn main() { pub struct AppState { pub engine: Arc<Mutex<StorageEngine>>, pub sync: SyncEngine, // wraps the same Arc<Mutex<StorageEngine>> ... } }
The ingest path follows this protocol to minimize lock-hold time:
#![allow(unused)] fn main() { // 1. Lock briefly to take a snapshot for diff computation. let (existing_entities, existing_rels) = { let engine = self.store.lock().expect("engine lock"); let snap = engine.snapshot(); validate_sync_batch(&entities, &relationships, &snap)?; let ents = snap.entities_by_source(connector_id)...collect(); let rels = snap.relationships_by_source(connector_id)...collect(); (ents, rels) // snap dropped here, lock released }; // 2. Compute diff without holding any lock (pure CPU work). let mut batch = WriteBatch::new(); // ... diff logic ... // 3. Lock briefly again to commit the batch. if !batch.is_empty() { let mut engine = self.store.lock().expect("engine lock"); engine.write(batch)?; } }
This pattern ensures the lock is held for microseconds, not milliseconds. Multiple connectors can prepare their diffs concurrently; they only contend at the write step.
Async Compatibility
GraphReader<'snap> borrows its snapshot and cannot cross await points
because Rust's borrow checker prevents holding non-Send borrows across
awaits. The correct pattern for async handlers:
#![allow(unused)] fn main() { async fn query_handler(state: State<AppState>) -> Json<Value> { // Acquire engine lock, take snapshot, compute result, drop snapshot. // All synchronous — no await while holding the borrow. let result = { let engine = state.engine.lock().unwrap(); let snap = engine.snapshot(); let graph = GraphReader::new(&snap); graph.find("host").collect::<Vec<_>>() .into_iter().map(entity_to_json).collect::<Vec<_>>() // snap drops here, lock released }; Json(json!({ "entities": result })) } }
Performance Characteristics
With this model on modern NVMe hardware:
| Operation | Throughput | Notes |
|---|---|---|
| Snapshot acquisition | ~1ns | Arc::clone = one atomic increment |
| Entity lookup (MemTable) | ≤1μs p99 | BTreeMap lookup |
| Entity lookup (Segment) | ≤100μs p99 | Linear scan; will improve with index in v0.2 |
| WAL write throughput | ≥500K ops/sec | With batching |
| Write lock contention | <5μs typical | Lock held only for MemTable update |
Invariant Reference
Parallax formalizes correctness as numbered invariants across all specs. Every code change should be checked against the invariants it might affect. Invariants are referenced in commit messages using their codes.
Data Model Invariants (INV-01..08)
From specs/01-data-model.md:
| Code | Invariant |
|---|---|
| INV-01 | Every entity has a non-empty _type, _class, and entity key. |
| INV-02 | EntityId is deterministic: same (account_id, type, key) always produces the same ID. |
| INV-03 | Every relationship's from_id and to_id reference entities that exist in the committed graph. |
| INV-04 | No two entities in the same account share (type, key). |
| INV-05 | No two relationships share (from_id, class, to_id) unless explicitly keyed with derive_with_key. |
| INV-06 | Timestamps are monotonically increasing per node. |
| INV-07 | Property types are stable within an entity type across versions (no type changes). |
| INV-08 | Property values are flat — no nested objects or arrays-of-objects. |
Storage Engine Invariants (INV-S01..S08)
From specs/02-storage-engine.md:
| Code | Invariant |
|---|---|
| INV-S01 | A committed write is never lost, even after a crash. |
| INV-S02 | A read snapshot never observes a partial write. |
| INV-S03 | Snapshots are monotonically increasing — a newer snapshot always supersedes an older one. |
| INV-S04 | WAL entries are append-only and immutable after write. |
| INV-S05 | A WAL entry with a CRC mismatch is treated as corrupt; recovery stops at the last valid entry. |
| INV-S06 | MemTable flush is atomic: either all entries move to a segment or none do. |
| INV-S07 | Segment files are immutable after creation. Compaction produces new segments, never modifies old ones. |
| INV-S08 | An entity with _deleted = true must never appear in query results. |
Graph Engine Invariants (INV-G01..G06)
From specs/03-graph-engine.md:
| Code | Invariant |
|---|---|
| INV-G01 | GraphReader<'snap> references cannot outlive their snapshot (enforced by the borrow checker). |
| INV-G02 | Traversal never follows edges to deleted entities. |
| INV-G03 | BFS traversal visits each entity at most once (no infinite loops in cyclic graphs). |
| INV-G04 | Shortest path returns None when no path exists; it never returns an incorrect path. |
| INV-G05 | Blast radius computation is bounded by max_depth (default: 4). |
| INV-G06 | Coverage gap analysis only returns entities of target_type that have no qualifying neighbor. |
Query Language Invariants (INV-Q01..Q06)
From specs/04-query-language.md:
| Code | Invariant |
|---|---|
| INV-Q01 | PQL parsing is deterministic: the same query string always produces the same AST. |
| INV-Q02 | A query never returns results from entities that do not satisfy all specified filters. |
| INV-Q03 | LIMIT n applied to a query returns at most n results. |
| INV-Q04 | A query that times out returns an error, not a partial result. |
| INV-Q05 | FIND SHORTEST PATH FROM A TO B returns the minimum-hop path or None; never a longer path. |
| INV-Q06 | FIND BLAST RADIUS FROM X DEPTH n returns only entities reachable within n hops. |
Connector SDK Invariants (INV-C01..C06)
From specs/05-integration-sdk.md:
| Code | Invariant |
|---|---|
| INV-C01 | A sync commit is atomic: either all entities/relationships land or none do. |
| INV-C02 | Entities from connector A are never deleted by a sync from connector B. |
| INV-C03 | Entity IDs are deterministic from (account_id, type, key) — same as INV-02. |
| INV-C04 | A relationship in a sync batch whose from_id or to_id does not exist (in batch or graph) is rejected. |
| INV-C05 | Step dependencies form a DAG — circular step dependencies are rejected at connector load time. |
| INV-C06 | A failed step does not prevent independent steps from running. |
API Surface Invariants (INV-A01..A06)
From specs/06-api-surface.md:
| Code | Invariant |
|---|---|
| INV-A01 | All write endpoints require authentication when an API key is configured. |
| INV-A02 | API key comparison uses constant-time equality to prevent timing attacks. |
| INV-A03 | Every committed write is visible to subsequent reads on the same server instance. |
| INV-A04 | Query responses are paginated; no single response exceeds the configured max_results limit. |
| INV-A05 | Every request has a X-Request-Id header (generated or propagated) for tracing. |
| INV-A06 | The /v1/health endpoint is exempt from authentication. |
Policy Engine Invariants (INV-P01..P06)
From specs/08-policy-engine.md:
| Code | Invariant |
|---|---|
| INV-P01 | A policy rule with an invalid PQL query is rejected at load time, not at evaluation time. |
| INV-P02 | Policy evaluation never modifies the graph. It is read-only. |
| INV-P03 | A rule that errors during evaluation is recorded as an error, not as a pass or fail. |
| INV-P04 | Posture score is computed from all loaded rules, including errored ones (they count as failures). |
| INV-P05 | Framework mapping (CIS, NIST, PCI-DSS) is metadata on rules; it does not affect evaluation logic. |
| INV-P06 | Policy rules are validated against the PQL parser before being accepted into the rule set. |
How to Use This Reference
When modifying code, ask: Which invariants does this change touch?
Reference affected invariants in your commit message:
fix(ingest): reject dangling relationships at commit time (INV-C04, INV-03)
feat(store): implement WAL CRC32C verification (INV-S05)
refactor(graph): bound traversal depth to prevent unbounded BFS (INV-G03)
When adding new behavior, ask: Does this require a new invariant? If yes, add it to the appropriate spec file and to this reference page.
Storage Engine Overview
The parallax-store crate is the durable foundation of the entire system.
It provides:
- Durability — entities and relationships are written to a WAL before being applied to memory.
- Point lookups — retrieve an entity by ID in ≤1μs from MemTable.
- MVCC snapshots — immutable, frozen views of the graph that readers hold without blocking writes.
- Compaction — (v0.2) background reclamation of space from deleted versions.
The storage engine does not understand graph semantics. It stores keyed
records. The graph engine (parallax-graph) builds traversal and adjacency
on top of the storage snapshot interface.
Architecture
StorageEngine
├── WriteAheadLog (WAL)
│ └── wal-00000000.pxw, wal-00000001.pxw, ...
│
├── MemTable (in-memory)
│ ├── entities: BTreeMap<EntityId, Entity>
│ ├── relationships: BTreeMap<RelationshipId, Relationship>
│ ├── type_index: HashMap<EntityType, Vec<EntityId>>
│ ├── class_index: HashMap<EntityClass, Vec<EntityId>>
│ ├── source_index: HashMap<ConnectorId, Vec<EntityId>>
│ └── adjacency: HashMap<EntityId, (Vec<RelId>, Vec<RelId>)>
│
├── Segments (on-disk, immutable)
│ └── seg-00000000.pxs, seg-00000001.pxs, ...
│
└── SnapshotManager
└── current: ArcSwap<Snapshot>
└── Snapshot { memtable_ref, segments: Arc<Vec<SegmentRef>> }
Write Path
Every mutation follows this sequence:
- Build a
WriteBatch(set of upsert/delete operations) - Serialize and append to WAL with CRC32C checksum
fsync()— durability point; crash here loses nothing already committed- Apply batch to MemTable
- Publish new snapshot via
ArcSwap::store
#![allow(unused)] fn main() { let mut engine = StorageEngine::open(StoreConfig::new("/var/lib/parallax"))?; let mut batch = WriteBatch::new(); batch.upsert_entity(entity); batch.upsert_relationship(rel); engine.write(batch)?; // Steps 1-5 above }
Read Path
Reads always go through a Snapshot:
#![allow(unused)] fn main() { let snap = engine.snapshot(); // Arc::clone — O(1), no lock // Entity lookup: MemTable first, then segment scan if let Some(entity) = snap.get_entity(entity_id) { println!("{}", entity.display_name); } // snap dropped here; frees the Arc }
MemTable Flush
When the MemTable exceeds memtable_flush_size (default: 64MB), it is flushed
to an immutable segment file:
- Serialize all entities and relationships to a
.pxssegment file - The new snapshot points to the fresh (empty) MemTable + the new segment
- The old MemTable data is freed
The adjacency index is preserved through flushes.
Crash Recovery
On StorageEngine::open(), if WAL segments exist:
- Replay WAL entries in order, verifying CRC32C on each
- Stop at the first corrupt entry (INV-S05)
- Apply all valid entries to rebuild the MemTable
- Publish the recovered snapshot
Recovery is the only code path that rebuilds MemTable from WAL. Normal operation never re-reads the WAL.
Storage Engine API
#![allow(unused)] fn main() { // Open or create a storage engine let engine = StorageEngine::open(StoreConfig::new(data_dir))?; // Write a batch (atomic, durable) engine.write(batch)?; // Get an MVCC snapshot (O(1)) let snap = engine.snapshot(); // Access entity counts (for stats/metrics) let metrics = engine.metrics().snapshot(); }
See StorageEngine API for the full interface.
Write-Ahead Log (WAL)
The WAL is Parallax's durability guarantee. Every mutation is written to the
WAL and fsync()'d before being applied to the MemTable. If the process
crashes, the WAL replays to reconstruct the MemTable.
On-Disk Format
WAL data lives in {data_dir}/wal/. Each segment file is named
wal-{index:08}.pxw (e.g., wal-00000000.pxw).
Each entry in a WAL file has this layout (little-endian):
┌──────────┬──────────┬──────────┬────────────┬──────────┐
│ magic(4) │ len(4) │ seq(8) │ payload(N) │ crc32(4) │
└──────────┴──────────┴──────────┴────────────┴──────────┘
| Field | Size | Description |
|---|---|---|
magic | 4 bytes | 0x50585741 — ASCII "PXWA" (Parallax WAL) |
len | 4 bytes | Total entry length including all fields |
seq | 8 bytes | Monotonic sequence number |
payload | N bytes | Serialized WriteBatch (postcard format) |
crc32c | 4 bytes | CRC32C of `(seq bytes |
Write Protocol
#![allow(unused)] fn main() { pub fn append(&mut self, batch: &WriteBatch) -> Result<u64, WalError> { let seq = self.next_seq; self.next_seq += 1; // Serialize the batch with postcard (compact binary format) let payload = postcard::to_allocvec(batch)?; // Compute checksum over seq + payload let crc = crc32c_combine(seq.to_le_bytes(), &payload); // Rotate to a new segment if the active file exceeds MAX_SEGMENT_SIZE if self.active_size + entry_len > MAX_SEGMENT_SIZE { self.rotate()?; } // Write all fields self.active.write_all(&WAL_MAGIC)?; self.active.write_all(&(total_len as u32).to_le_bytes())?; self.active.write_all(&seq.to_le_bytes())?; self.active.write_all(&payload)?; self.active.write_all(&crc.to_le_bytes())?; // fsync — this is the durability commit point self.active.sync_data()?; Ok(seq) } }
After sync_data() returns, the entry survives any crash.
Crash Recovery
On startup, WriteAheadLog::recover() replays all WAL segments:
#![allow(unused)] fn main() { pub fn recover(&mut self) -> Result<Vec<WriteBatch>, WalError> { let mut batches = Vec::new(); for segment_file in self.sorted_segment_files()? { let entries = self.read_segment(&segment_file)?; for entry in entries { // Verify magic if entry.magic != WAL_MAGIC { return Err(WalError::Corruption(...)); } // Verify checksum (INV-S05) if !verify_crc32c(entry.seq, &entry.payload, entry.crc32c) { // Stop at first corrupt entry tracing::warn!("WAL corruption detected at seq {}", entry.seq); break; } let batch = postcard::from_bytes(&entry.payload)?; batches.push(batch); } } Ok(batches) } }
INV-S05: A corrupt WAL entry stops recovery at that point. All entries
before it are applied; the corrupt entry and everything after are discarded.
This may result in losing the most recent write if the process crashed during
fsync(). This is correct — the write was not acknowledged to the caller yet.
Segment Rotation
When the active WAL segment exceeds MAX_SEGMENT_SIZE (default: 64MB):
sync_data()the active file- Close the active file handle
- Open a new file:
wal-{next_index:08}.pxw - Set the new file as active
Old WAL segments are retained until after a successful MemTable flush to
a .pxs segment. At that point, WAL segments covered by the segment are
safe to delete.
Group Commit (v0.2)
For high-throughput ingestion, individual fsync() calls are expensive
(~100μs on NVMe). Group commit batches multiple WriteBatches into a
single fsync():
100 batches × 1 fsync = 100× write throughput
vs.
100 batches × 100 fsyncs = 100× latency overhead
Group commit is planned for v0.2. The current v0.1 implementation does one
fsync() per WriteBatch, which is correct and suitable for all but the
highest-throughput ingestion scenarios.
Inspecting WAL Files
WAL files are binary and not human-readable directly. Use the CLI to inspect the current state of the engine rather than reading WAL files:
parallax stats --data-dir /var/lib/parallax
A future parallax wal dump command is planned for v0.2 debugging.
MemTable
The MemTable is the in-memory write buffer. Every write is applied here immediately after the WAL fsync. Reads check the MemTable first, then fall back to segment files.
Structure
#![allow(unused)] fn main() { pub struct MemTable { // Primary storage entities: BTreeMap<EntityId, Entity>, relationships: BTreeMap<RelationshipId, Relationship>, // Secondary indices (maintained in sync with primary) type_index: HashMap<EntityType, Vec<EntityId>>, class_index: HashMap<EntityClass, Vec<EntityId>>, source_index: HashMap<CompactString, Vec<EntityId>>, // connector_id → entities adjacency: HashMap<EntityId, (Vec<RelationshipId>, // outgoing Vec<RelationshipId>)>, // incoming } }
Operations
Upsert
#![allow(unused)] fn main() { pub fn upsert_entity(&mut self, entity: Entity) { let id = entity.id; let entity_type = entity._type.clone(); let entity_class = entity._class.clone(); let connector_id = entity.source.connector_id.clone(); // Update primary store self.entities.insert(id, entity); // Update all secondary indices self.type_index.entry(entity_type).or_default().push(id); self.class_index.entry(entity_class).or_default().push(id); self.source_index.entry(connector_id).or_default().push(id); // adjacency is updated by upsert_relationship } }
Tombstone (Soft Delete)
When the sync protocol determines that an entity was removed from a source:
#![allow(unused)] fn main() { pub fn delete_entity(&mut self, id: EntityId) { if let Some(entity) = self.entities.get_mut(&id) { entity._deleted = true; // soft delete — remains for snapshot visibility } } }
Soft-deleted entities are invisible to queries (INV-S08) but remain in memory until the next compaction cycle removes them.
Adjacency Index
The adjacency index enables O(1) neighbor lookups:
#![allow(unused)] fn main() { pub fn upsert_relationship(&mut self, rel: Relationship) { let rel_id = rel.id; let from_id = rel.from_id; let to_id = rel.to_id; self.relationships.insert(rel_id, rel); // Both endpoints track this relationship self.adjacency.entry(from_id).or_default().0.push(rel_id); // outgoing self.adjacency.entry(to_id).or_default().1.push(rel_id); // incoming } }
This is the index that makes graph traversal fast. Without it, every hop would require a full scan of all relationships.
Flush to Segment
When memtable.approx_bytes() > config.memtable_flush_size (default: 64MB),
StorageEngine::maybe_flush() runs:
#![allow(unused)] fn main() { fn maybe_flush(&mut self) -> Result<(), StoreError> { if self.memtable.approx_bytes() <= self.config.memtable_flush_size { return Ok(()); } // Write current MemTable contents to a new .pxs segment let segment_path = self.next_segment_path(); Segment::write(&segment_path, &self.memtable)?; // Drain the MemTable: clears entity/rel data, preserves adjacency index let new_segment = SegmentRef::open(segment_path)?; let drained = self.memtable.drain_to_flush(); self.segments.push(new_segment); // Publish new snapshot pointing to empty MemTable + new segment self.publish_snapshot(); Ok(()) } }
The drain_to_flush() operation is carefully designed:
- Entity and relationship data moves to the segment file
- The adjacency index is preserved (rebuilt from segments during recovery)
- Secondary indices are cleared (rebuilt from segment scans as needed)
Memory Accounting
#![allow(unused)] fn main() { pub fn approx_bytes(&self) -> usize { // Rough estimate: sum of entity and relationship sizes self.entities.values().map(|e| std::mem::size_of_val(e)).sum::<usize>() + self.relationships.values().map(|r| std::mem::size_of_val(r)).sum::<usize>() } }
This is an approximation — it counts stack sizes of the structs but not
heap-allocated strings. For memory budgeting, assume 2-4× the struct size
per entity due to CompactString heap allocations for long strings.
Query Methods
The MemTable exposes index-accelerated query methods used by Snapshot:
#![allow(unused)] fn main() { // O(1) lookup pub fn get_entity(&self, id: EntityId) -> Option<&Entity>; pub fn get_relationship(&self, id: RelationshipId) -> Option<&Relationship>; // Index-accelerated scans pub fn entities_by_type(&self, t: &EntityType) -> Vec<&Entity>; pub fn entities_by_class(&self, c: &EntityClass) -> Vec<&Entity>; pub fn entities_by_source(&self, connector_id: &str) -> Vec<&Entity>; pub fn all_entities(&self) -> impl Iterator<Item = &Entity>; // Adjacency (O(1) for the lookup, O(degree) for iteration) pub fn outgoing_relationships(&self, id: EntityId) -> Vec<&Relationship>; pub fn incoming_relationships(&self, id: EntityId) -> Vec<&Relationship>; }
Segments
Segment files are immutable, on-disk snapshots of MemTable contents. When
the MemTable grows beyond memtable_flush_size, it is flushed to a new
segment file and the in-memory data is freed.
On-Disk Format
Segment files live in {data_dir}/segments/ and are named
seg-{index:08}.pxs (e.g., seg-00000000.pxs).
┌──────────────┬────────────────────────────────────────┐
│ Header (5) │ Payload (postcard) │
├──────────────┼────────────────────────────────────────┤
│ magic (4) │ SegmentData { │
│ version (1) │ entities: Vec<Entity>, │
│ │ relationships: Vec<Relationship>, │
│ │ } │
└──────────────┴────────────────────────────────────────┘
| Field | Size | Value |
|---|---|---|
magic | 4 bytes | 0x50585347 — ASCII "PXSG" (Parallax SeGment) |
version | 1 byte | 1 (current format version) |
payload | N bytes | postcard::to_allocvec(SegmentData { entities, relationships }) |
INV-S07: Segment files are immutable after creation. Compaction produces new segment files — it never modifies existing ones.
Writing a Segment
#![allow(unused)] fn main() { pub fn write(path: &Path, memtable: &MemTable) -> Result<(), StoreError> { let entities: Vec<Entity> = memtable.all_entities() .filter(|e| !e._deleted) .cloned() .collect(); let relationships: Vec<Relationship> = memtable.all_relationships() .filter(|r| !r._deleted) .cloned() .collect(); let data = SegmentData { entities, relationships }; let payload = postcard::to_allocvec(&data) .map_err(|e| StoreError::Corruption(e.to_string()))?; let mut file = File::create(path)?; file.write_all(&SEGMENT_MAGIC)?; file.write_all(&[SEGMENT_VERSION])?; file.write_all(&payload)?; file.sync_all()?; Ok(()) } }
Soft-deleted entities and relationships are excluded from the segment. This is the mechanism by which deletes are physically reclaimed.
Reading a Segment
#![allow(unused)] fn main() { pub fn open(path: &Path) -> Result<SegmentRef, StoreError> { let bytes = std::fs::read(path)?; // Verify magic if bytes.len() < 5 || &bytes[..4] != SEGMENT_MAGIC { return Err(StoreError::Corruption("bad segment magic".into())); } // Check version let version = bytes[4]; if version != SEGMENT_VERSION { return Err(StoreError::Corruption( format!("unknown segment version {version}") )); } // Deserialize let data: SegmentData = postcard::from_bytes(&bytes[5..])?; Ok(SegmentRef { path: path.to_owned(), data }) } }
Snapshot Integration
A Snapshot holds an Arc<Vec<SegmentRef>> in addition to a MemTable
reference. When reading, the snapshot:
- Checks the MemTable first (most recent data)
- Scans segments in reverse order (newest to oldest)
- Returns the first match found
#![allow(unused)] fn main() { impl Snapshot { pub fn get_entity(&self, id: EntityId) -> Option<&Entity> { // MemTable first if let Some(e) = self.memtable.get_entity(id) { return if e._deleted { None } else { Some(e) }; } // Then segments (newest first) for segment in self.segments.iter().rev() { if let Some(e) = segment.get_entity(id) { return if e._deleted { None } else { Some(e) }; } } None } } }
Compaction (v0.2)
In v0.1, segments are never compacted. They accumulate until you delete the data directory. Background segment compaction is planned for v0.2:
- Merge multiple small segments into fewer large segments
- Remove soft-deleted entities and stale versions
- Build a sparse index on segment boundaries for faster lookups
The current linear scan approach is correct for graphs up to ~1M entities. At larger scale, the segment index becomes important for performance.
MVCC Snapshots
Snapshots are the read interface to the storage engine. They provide a frozen, immutable view of the graph at a point in time. Readers acquire a snapshot once and hold it for the duration of a read operation — they never block writers, and writers never invalidate their data.
What a Snapshot Is
#![allow(unused)] fn main() { pub struct Snapshot { /// Immutable reference to MemTable at the time of snapshot creation. /// The MemTable is never mutated after the snapshot is published. memtable: Arc<MemTable>, /// Ordered list of segment references (oldest to newest). segments: Arc<Vec<SegmentRef>>, } }
The Snapshot is wrapped in Arc<Snapshot> so multiple readers can share
the same snapshot cheaply. Acquiring a snapshot is Arc::clone — one atomic
increment on the reference count.
Snapshot Manager
The SnapshotManager maintains the current published snapshot using
arc-swap for lock-free atomic updates:
#![allow(unused)] fn main() { pub struct SnapshotManager { current: ArcSwap<Snapshot>, } impl SnapshotManager { /// Called by readers — O(1), no lock. pub fn load(&self) -> Arc<Snapshot> { self.current.load_full() } /// Called by the writer after every commit — O(1) atomic swap. pub fn publish(&self, snapshot: Snapshot) { self.current.store(Arc::new(snapshot)); } } }
arc-swap guarantees that:
- A reader loading the snapshot always gets a consistent, complete view.
- There is no moment when the snapshot pointer is null or partially updated.
- No reader needs to hold a lock to read the snapshot.
Snapshot Lifetime
#![allow(unused)] fn main() { // Acquiring: O(1), no lock, no allocation let snap = engine.snapshot(); // = manager.load() // Using: reads go through the snapshot — guaranteed consistent view let entity = snap.get_entity(id); let hosts = snap.entities_by_class(&EntityClass::new_unchecked("Host")); // Releasing: O(1) when Arc reference count drops to zero drop(snap); }
When a snapshot is dropped, its reference count decrements. If this was the last reference, the Arc frees the MemTable and segment list it held. Old MemTable data can then be freed.
Snapshot Query Methods
#![allow(unused)] fn main() { impl Snapshot { // Point lookups (O(1) MemTable, O(n) segment scan) pub fn get_entity(&self, id: EntityId) -> Option<&Entity>; pub fn get_relationship(&self, id: RelationshipId) -> Option<&Relationship>; // Index-accelerated scans pub fn entities_by_type(&self, t: &EntityType) -> Vec<&Entity>; pub fn entities_by_class(&self, c: &EntityClass) -> Vec<&Entity>; pub fn entities_by_source(&self, connector_id: &str) -> Vec<&Entity>; pub fn relationships_by_source(&self, connector_id: &str) -> Vec<&Relationship>; pub fn all_entities(&self) -> impl Iterator<Item = &Entity> + '_; // Adjacency pub fn outgoing(&self, id: EntityId) -> Vec<&Relationship>; pub fn incoming(&self, id: EntityId) -> Vec<&Relationship>; // Stats pub fn entity_count(&self) -> usize; pub fn relationship_count(&self) -> usize; } }
Consistency Guarantees
Read-your-writes: Within the same StorageEngine instance, a read
snapshot acquired after a write() call will always see the written data.
Snapshot isolation: A snapshot acquired at time T will never see writes committed after T, even if those writes happen on the same thread.
No dirty reads: A snapshot only contains data from committed
WriteBatches — data written to the WAL but not yet applied to the MemTable
is not visible.
Using Snapshots in async Code
Snapshot contains Arc references, making it Send + Sync. However,
GraphReader<'snap> borrows the snapshot and cannot cross await points.
The recommended pattern:
#![allow(unused)] fn main() { async fn my_handler(engine: Arc<Mutex<StorageEngine>>) -> Vec<Entity> { // Block: acquire lock, snapshot, compute, release let results = { let engine = engine.lock().unwrap(); let snap = engine.snapshot(); // All computation here — no await snap.entities_by_class(&EntityClass::new_unchecked("Host")) .into_iter() .filter(|e| !e._deleted) .cloned() .collect::<Vec<_>>() // snap dropped, lock released }; // Now you can await freely with owned Vec<Entity> process_results(results).await } }
Alternatively, clone the Arc<Snapshot> and pass it to a spawn_blocking task
for CPU-intensive graph operations that would otherwise block the async runtime.
StorageEngine API
StorageEngine is the top-level coordinator that ties together the WAL,
MemTable, Segments, and SnapshotManager.
Opening an Engine
#![allow(unused)] fn main() { use parallax_store::{StorageEngine, StoreConfig}; // Open an existing engine or create a new one let engine = StorageEngine::open(StoreConfig::new("/var/lib/parallax"))?; }
StoreConfig accepts:
#![allow(unused)] fn main() { pub struct StoreConfig { /// Root directory for all engine data. pub data_dir: PathBuf, /// Flush MemTable to segment when size exceeds this (default: 64MB). pub memtable_flush_size: usize, /// Maximum WAL segment size before rotation (default: 64MB). pub wal_segment_max_size: u64, } impl StoreConfig { /// Create a config with default settings. pub fn new(data_dir: impl Into<PathBuf>) -> Self { ... } } }
Writing Data
All writes go through WriteBatch:
#![allow(unused)] fn main() { use parallax_store::WriteBatch; let mut batch = WriteBatch::new(); // Upsert an entity (insert or update) batch.upsert_entity(entity); // Upsert a relationship batch.upsert_relationship(relationship); // Soft-delete an entity batch.delete_entity(entity_id); // Soft-delete a relationship batch.delete_relationship(rel_id); // Commit atomically (WAL + MemTable + snapshot publish) engine.write(batch)?; }
WriteBatch is opaque — you cannot inspect it after creation. The write is
atomic: either all operations succeed or none are applied.
WriteOp Internals
Under the hood, WriteBatch is a Vec<WriteOp>:
#![allow(unused)] fn main() { pub enum WriteOp { UpsertEntity(Entity), UpsertRelationship(Relationship), DeleteEntity(EntityId), DeleteRelationship(RelationshipId), } pub struct WriteBatch { pub(crate) ops: Vec<WriteOp>, } impl WriteBatch { pub fn is_empty(&self) -> bool { self.ops.is_empty() } pub fn len(&self) -> usize { self.ops.len() } } }
Reading Data
All reads go through a Snapshot:
#![allow(unused)] fn main() { // Acquire a snapshot (O(1), no lock) let snap = engine.snapshot(); // Point lookups let entity = snap.get_entity(entity_id); let rel = snap.get_relationship(rel_id); // Type/class scans (uses MemTable indices) let hosts = snap.entities_by_type(&EntityType::new_unchecked("host")); let services = snap.entities_by_class(&EntityClass::new_unchecked("Service")); // Source-scoped scans (for sync diff) let from_aws = snap.entities_by_source("connector-aws"); // Counts let total = snap.entity_count(); }
Metrics
#![allow(unused)] fn main() { // Get a snapshot of current engine metrics let metrics = engine.metrics().snapshot(); println!("Total entities: {}", metrics.entity_count); println!("Total relationships: {}", metrics.relationship_count); println!("Writes: {}", metrics.writes_total); println!("Reads: {}", metrics.reads_total); }
Metrics are maintained as atomic counters and can be read from any thread without acquiring the engine lock.
Engine in a Shared Context
In server mode, the engine is shared via Arc<Mutex<StorageEngine>>:
#![allow(unused)] fn main() { use std::sync::{Arc, Mutex}; use parallax_store::StorageEngine; let engine = StorageEngine::open(config)?; let shared = Arc::new(Mutex::new(engine)); // In a handler: let snap = { let engine = shared.lock().unwrap(); engine.snapshot() // Lock released here — snapshot is Arc-owned, not borrowed from engine }; // Use snap freely without holding the lock }
Error Types
#![allow(unused)] fn main() { pub enum StoreError { /// I/O error from the OS (file not found, permission denied, etc.) Io(io::Error), /// Data corruption detected (bad magic, CRC mismatch, etc.) Corruption(String), /// Serialization error (postcard format error) Serialization(String), } }
StoreError implements std::error::Error and Display. Use ? to
propagate it up the call stack.
Thread Safety
StorageEngine is Send but not Sync. Wrap it in Arc<Mutex<StorageEngine>>
for shared access. The lock should be held as briefly as possible:
#![allow(unused)] fn main() { // Good: lock, snapshot, release, use let snap = engine.lock().unwrap().snapshot(); let entity = snap.get_entity(id); // Bad: hold lock for the duration of graph computation let engine = engine.lock().unwrap(); let entity = engine.snapshot().get_entity(id); // lock held too long }
Graph Engine Overview
parallax-graph is the reasoning layer between raw storage and the query
language. It takes an MVCC snapshot and exposes graph-aware operations:
traversal, pattern matching, shortest path, blast radius, and coverage gap.
The GraphReader
All graph operations start with a GraphReader<'snap>, which borrows an
immutable snapshot and provides zero-copy access to graph data:
#![allow(unused)] fn main() { use parallax_graph::GraphReader; let snap = engine.snapshot(); let graph = GraphReader::new(&snap); // Entity finder let hosts: Vec<&Entity> = graph .find("host") .with_property("state", Value::from("running")) .collect(); // Traversal let neighbors = graph .traverse(start_id) .direction(Direction::Outgoing) .max_depth(3) .collect(); // Shortest path let path = graph .shortest_path(from_id, to_id) .find(); // Blast radius let blast = graph .blast_radius(target_id) .add_attack_edge("RUNS", Direction::Outgoing) .analyze(); // Coverage gap let uncovered = graph .coverage_gap("PROTECTS") .target_type("host") .neighbor_type("edr_agent") .find(); }
Lifetime Discipline
GraphReader<'snap> ties every returned reference to the snapshot's
lifetime via Rust's borrow checker. You cannot accidentally hold a reference
to a entity after the snapshot is dropped.
#![allow(unused)] fn main() { let entity: &Entity = { let snap = engine.snapshot(); let graph = GraphReader::new(&snap); graph.get_entity(id).unwrap() // ERROR: snap dropped here, but entity borrows from it }; }
This compile-time guarantee eliminates an entire class of use-after-free and stale-read bugs.
Operations Summary
| Operation | Builder | Description |
|---|---|---|
| Entity finder | GraphReader::find() | Filter entities by type, class, properties |
| All entities | GraphReader::find_all() | Return all non-deleted entities |
| By class | GraphReader::find_by_class() | Filter by entity class |
| Traversal | GraphReader::traverse() | BFS/DFS from a starting entity |
| Shortest path | GraphReader::shortest_path() | Minimum-hop path between two entities |
| Blast radius | GraphReader::blast_radius() | Attack impact analysis from a target |
| Coverage gap | GraphReader::coverage_gap() | Find entities with no qualifying neighbor |
| Direct lookup | GraphReader::get_entity() | O(1) entity lookup by ID |
Performance
The graph engine is designed for interactive query latency:
| Operation | Target p99 |
|---|---|
| Entity lookup by ID (MemTable) | ≤1μs |
| Single-hop traversal (degree ≤100) | ≤500μs |
| Multi-hop traversal (depth 3, degree 5) | ≤5ms |
| Shortest path (1K-node graph) | ≤10ms |
| Blast radius (depth 4) | ≤10ms |
These targets assume data in MemTable. Segment reads add ~100μs per entity lookup due to linear scan; segment indexing is planned for v0.2.
GraphReader
GraphReader<'snap> is the primary entry point for all graph operations.
It wraps an MVCC snapshot and provides typed, zero-copy access to graph data.
Creating a GraphReader
#![allow(unused)] fn main() { use parallax_graph::GraphReader; use parallax_store::StorageEngine; let engine = StorageEngine::open(config)?; let snap = engine.snapshot(); // O(1), no lock after this let graph = GraphReader::new(&snap); // graph can now be used freely; snap must not be dropped while graph is in scope }
Full API Reference
#![allow(unused)] fn main() { impl<'snap> GraphReader<'snap> { pub fn new(snapshot: &'snap Snapshot) -> Self; // ── Entity finding ──────────────────────────────────────────────── /// Find entities by type. Uses type index. pub fn find(&self, entity_type: &str) -> EntityFinder<'snap>; /// Find entities by class. Uses class index. pub fn find_by_class(&self, class: &str) -> EntityFinder<'snap>; /// Find all entities. Full scan. pub fn find_all(&self) -> EntityFinder<'snap>; /// O(1) point lookup by EntityId. pub fn get_entity(&self, id: EntityId) -> Option<&'snap Entity>; /// O(1) point lookup by RelationshipId. pub fn get_relationship(&self, id: RelationshipId) -> Option<&'snap Relationship>; // ── Traversal ───────────────────────────────────────────────────── /// Start a traversal from a specific entity. pub fn traverse(&self, start: EntityId) -> TraversalBuilder<'snap>; // ── Path finding ────────────────────────────────────────────────── /// Find the shortest path between two entities. pub fn shortest_path(&self, from: EntityId, to: EntityId) -> ShortestPathBuilder<'snap>; // ── Blast radius ────────────────────────────────────────────────── /// Compute the blast radius from a target entity. pub fn blast_radius(&self, origin: EntityId) -> BlastRadiusBuilder<'snap>; // ── Coverage analysis ───────────────────────────────────────────── /// Find entities missing a qualifying neighbor via the given verb. pub fn coverage_gap(&self, verb: &str) -> CoverageGapBuilder<'snap>; } }
Lifetime Annotation
The 'snap lifetime on GraphReader<'snap> means:
- Every
&'snap Entityreference returned by the reader borrows from the snapshot - The snapshot must outlive the reader and all references derived from it
- The Rust borrow checker enforces this at compile time — no runtime cost
This design guarantees:
- Zero-copy reads: Entity data is never cloned from the snapshot
- No use-after-free: You cannot hold a reference to freed snapshot data
- No stale reads: You always read from the snapshot you created the reader with
Collecting vs Cloning
Since GraphReader returns references with snapshot lifetimes, you must
either use the data within the snapshot's scope or clone it:
#![allow(unused)] fn main() { // Use within scope — zero allocation let snap = engine.snapshot(); let graph = GraphReader::new(&snap); let count = graph.find("host").count(); // no allocation let first_host = graph.find("host").collect().first().map(|e| e.display_name.as_str()); // Clone if you need owned data past the snapshot let hosts: Vec<Entity> = graph .find("host") .collect() .into_iter() .cloned() // Clone here when crossing async boundaries .collect(); drop(snap); // now safe to drop }
In Server Handlers
#![allow(unused)] fn main() { async fn list_hosts(State(state): State<AppState>) -> Json<Value> { let hosts = { let engine = state.engine.lock().unwrap(); let snap = engine.snapshot(); let graph = GraphReader::new(&snap); // Collect into owned Vec before dropping snap + lock graph.find("host") .collect() .into_iter() .cloned() .collect::<Vec<Entity>>() }; // engine lock + snap released here Json(json!({ "entities": hosts.iter().map(entity_to_json).collect::<Vec<_>>() })) } }
Entity Finder
The entity finder is a fluent builder for filtering entities from a snapshot. It uses secondary indices (type, class) when available and falls back to a full scan for property-level filters.
Basic Usage
#![allow(unused)] fn main() { let graph = GraphReader::new(&snap); // Find all entities of a type let hosts: Vec<&Entity> = graph.find("host").collect(); // Find all entities of a class let all_hosts: Vec<&Entity> = graph .find_by_class("Host") .collect(); // Find with a property filter let running: Vec<&Entity> = graph .find("host") .with_property("state", Value::from("running")) .collect(); }
EntityFinder Methods
#![allow(unused)] fn main() { impl<'snap> EntityFinder<'snap> { /// Filter to entities of the specified class. /// Uses class index — no full scan. pub fn class(self, c: &str) -> Self; /// Filter to entities where the named property equals the value. /// Uses full scan (no property index in v0.1). pub fn with_property(self, key: &str, value: Value) -> Self; /// Limit the number of results. pub fn limit(self, n: usize) -> Self; /// Collect all matching entities. pub fn collect(self) -> Vec<&'snap Entity>; /// Count matching entities without materializing them. pub fn count(self) -> usize; } }
GraphReader Finder Variants
#![allow(unused)] fn main() { impl<'snap> GraphReader<'snap> { /// Find entities by type ("host", "aws_ec2_instance"). /// Uses the type index. pub fn find(&self, entity_type: &str) -> EntityFinder<'snap>; /// Find entities by class ("Host", "User", "DataStore"). /// Uses the class index. pub fn find_by_class(&self, class: &str) -> EntityFinder<'snap>; /// Find all entities (no type filter). Full scan. pub fn find_all(&self) -> EntityFinder<'snap>; /// Direct lookup by EntityId. O(1). pub fn get_entity(&self, id: EntityId) -> Option<&'snap Entity>; } }
Examples
Find running hosts
#![allow(unused)] fn main() { let running_hosts = graph .find("host") .with_property("state", Value::from("running")) .collect(); }
Count all services
#![allow(unused)] fn main() { let count = graph.find_by_class("Service").count(); }
Find with limit
#![allow(unused)] fn main() { // First 100 hosts for pagination let page = graph.find("host").limit(100).collect(); }
Index Strategy
The planner chooses the access strategy based on the query:
| Filter | Access Method | Cost |
|---|---|---|
| Type only | Type index lookup | O(n_type) |
| Class only | Class index lookup | O(n_class) |
| Type + property | Type index + filter scan | O(n_type) |
| Class + property | Class index + filter scan | O(n_class) |
| Property only | Full scan | O(n_total) |
Property-level secondary indexes are planned for v0.2. For v0.1, property filters always require a linear scan over the type/class result set.
Deleted Entity Handling
The finder automatically excludes soft-deleted entities (_deleted = true).
You never see deleted entities in query results (INV-S08).
Graph Traversal
Traversal walks the graph from a starting entity, following edges in a specified direction and applying filters at each step.
Basic Traversal
#![allow(unused)] fn main() { let graph = GraphReader::new(&snap); // Traverse all outgoing edges from start_id, up to depth 3 let results: Vec<TraversalResult> = graph .traverse(start_id) .direction(Direction::Outgoing) .max_depth(3) .collect(); }
TraversalBuilder Options
#![allow(unused)] fn main() { pub struct TraversalBuilder<'snap> { ... } impl<'snap> TraversalBuilder<'snap> { /// BFS (default) or DFS traversal order. pub fn bfs(self) -> Self; pub fn dfs(self) -> Self; /// Direction of edges to follow. pub fn direction(self, dir: Direction) -> Self; // Direction::Outgoing, Direction::Incoming, Direction::Both /// Only follow edges with these relationship classes (verbs). pub fn edge_classes(self, classes: &[&str]) -> Self; /// Only visit entities with this type. pub fn filter_node_type(self, t: &str) -> Self; /// Only visit entities with this class. pub fn filter_node_class(self, c: &str) -> Self; /// Only visit entities matching this property filter. pub fn filter_node_property(self, key: &str, value: Value) -> Self; /// Stop after visiting this many hops from the start. pub fn max_depth(self, depth: usize) -> Self; /// Collect all results pub fn collect(self) -> Vec<TraversalResult<'snap>>; } }
TraversalResult
Each visited entity produces a TraversalResult:
#![allow(unused)] fn main() { pub struct TraversalResult<'snap> { /// The entity reached at this hop. pub entity: &'snap Entity, /// The relationship that connected the previous hop to this entity. pub via: Option<&'snap Relationship>, /// How many hops from the start entity. pub depth: usize, /// The full path from start to this entity. pub path: GraphPath, } }
GraphPath
#![allow(unused)] fn main() { pub struct GraphPath { /// Ordered sequence of (entity_id, relationship_id) pairs. pub segments: Vec<PathSegment>, } pub struct PathSegment { pub entity_id: EntityId, pub via_relationship: Option<RelationshipId>, } }
Paths are returned for every traversal result. For large traversals, omit
path tracking by using a custom iterator instead of collect().
Cycle Handling
BFS traversal maintains a visited set and never visits an entity twice.
This prevents infinite loops in cyclic graphs (INV-G03):
#![allow(unused)] fn main() { // This terminates even in a graph with cycles: let results = graph .traverse(a_id) .direction(Direction::Outgoing) .max_depth(100) .collect(); }
DFS also maintains the visited set, so it is also cycle-safe (but produces different ordering than BFS).
Direction
#![allow(unused)] fn main() { pub enum Direction { /// Follow outgoing edges: entity → neighbor Outgoing, /// Follow incoming edges: entity ← neighbor Incoming, /// Follow both directions Both, } }
Examples
Find all services reachable from a host
#![allow(unused)] fn main() { let services = graph .traverse(host_id) .direction(Direction::Outgoing) .filter_node_class("Service") .max_depth(3) .collect(); }
Find all entities that depend on a database
#![allow(unused)] fn main() { let dependents = graph .traverse(db_id) .direction(Direction::Incoming) .edge_classes(&["USES", "CONNECTS", "READS"]) .max_depth(5) .collect(); }
BFS vs DFS
Use BFS (default) when you want results ordered by proximity — closer entities first. This is best for blast radius and impact analysis.
Use DFS when you want to follow a chain as deep as possible before backtracking. This is better for path exploration.
#![allow(unused)] fn main() { // BFS: finds nearest entities first let bfs = graph.traverse(start).bfs().max_depth(4).collect(); // DFS: explores deep paths first let dfs = graph.traverse(start).dfs().max_depth(4).collect(); }
Shortest Path
Shortest path finds the minimum-hop chain of relationships between two specific entities. This answers questions like:
- "What is the access path between this user and this S3 bucket?"
- "Is there any connection between these two systems?"
- "What is the shortest privilege escalation route from this role to that secret?"
Basic Usage
#![allow(unused)] fn main() { let graph = GraphReader::new(&snap); let path = graph .shortest_path(user_id, s3_bucket_id) .find(); match path { Some(path) => { println!("Found path with {} hops", path.segments.len()); for segment in &path.segments { println!(" Entity: {:?}", segment.entity_id); } } None => println!("No path exists"), } }
ShortestPathBuilder
#![allow(unused)] fn main() { impl<'snap> ShortestPathBuilder<'snap> { /// Limit the search to this many hops (default: unlimited). pub fn max_depth(self, depth: usize) -> Self; /// Only follow edges with these verbs. pub fn edge_classes(self, classes: &[&str]) -> Self; /// Direction of edges to follow (default: Both). pub fn direction(self, dir: Direction) -> Self; /// Execute the search. Returns None if no path exists. pub fn find(self) -> Option<GraphPath>; } }
Algorithm
Parallax implements bidirectional BFS — searching from both the source and target simultaneously. This dramatically reduces the search space for sparse graphs:
Unidirectional BFS: O(b^d) nodes visited (b = branching factor, d = depth)
Bidirectional BFS: O(b^(d/2)) nodes visited
For a graph with branching factor 10 and path length 6:
- Unidirectional: 10^6 = 1,000,000 nodes
- Bidirectional: 10^3 = 1,000 nodes (1000× fewer)
The two frontiers meet in the middle to form the path.
INV-Q05: Shortest path always returns the minimum-hop path or None.
It never returns a longer path than the shortest one.
Handling Cycles
The BFS searches maintain visited sets, so cyclic graphs are handled correctly. The search terminates when:
- The two frontiers meet (path found)
- Both frontiers are exhausted (no path)
max_depthis reached (INV-Q06)
Examples
Privilege escalation path
#![allow(unused)] fn main() { let path = graph .shortest_path(attacker_id, secret_id) .edge_classes(&["ASSIGNED", "ALLOWS", "HAS", "USES"]) .direction(Direction::Both) .max_depth(6) .find(); }
Reachability check
#![allow(unused)] fn main() { // Is there any connection between these two networks? let connected = graph .shortest_path(network_a_id, network_b_id) .edge_classes(&["CONNECTS", "CONTAINS"]) .find() .is_some(); }
PQL Equivalent
-- PQL
FIND SHORTEST PATH FROM user WITH email = 'alice@corp.com'
TO aws_s3_bucket WITH _key = 'arn:aws:s3:::secrets'
-- Rust equivalent (after resolving entity IDs)
graph.shortest_path(alice_id, bucket_id).find()
Blast Radius
Blast radius analysis answers: "If this entity is compromised, what else is at risk?"
Given a starting entity (e.g., a compromised host or credential), blast radius follows attacker-relevant relationships to identify all entities within attack reach.
Basic Usage
#![allow(unused)] fn main() { let graph = GraphReader::new(&snap); let result = graph .blast_radius(compromised_host_id) .add_attack_edge("RUNS", Direction::Outgoing) // host → services .add_attack_edge("CONNECTS", Direction::Outgoing) // host → other hosts .max_depth(4) .analyze(); println!("Impacted entities: {}", result.impacted.len()); println!("Critical paths: {}", result.critical_paths.len()); println!("High-value targets: {}", result.high_value_targets.len()); }
BlastRadiusBuilder
#![allow(unused)] fn main() { impl<'snap> BlastRadiusBuilder<'snap> { /// Add an attack path edge type with direction. /// Call multiple times for multiple attack vectors. pub fn add_attack_edge(self, verb: &str, direction: Direction) -> Self; /// Maximum hops from the origin entity (default: 4). pub fn max_depth(self, depth: usize) -> Self; /// Run the analysis and return results. pub fn analyze(self) -> BlastRadiusResult<'snap>; } }
BlastRadiusResult
#![allow(unused)] fn main() { pub struct BlastRadiusResult<'snap> { /// The origin entity — the starting point of the attack. pub origin: &'snap Entity, /// All entities reachable via the specified attack edges. pub impacted: Vec<&'snap Entity>, /// Paths to entities classified as high-value targets. pub critical_paths: Vec<GraphPath>, /// High-value targets in the blast radius. /// These are entities whose class is in the high-value set. pub high_value_targets: Vec<&'snap Entity>, } }
High-Value Target Classes
The following entity classes are automatically identified as high-value targets when found in the blast radius:
DataStore Secret Key Database
Credential Certificate Identity Account
These classes represent data and access assets that attackers specifically target.
Default Attack Edges
If no attack edges are specified, the blast radius builder uses a default set of attacker-relevant relationship verbs:
RUNS, CONNECTS, TRUSTS, CONTAINS, HAS, USES, EXPLOITS
These cover the most common lateral movement patterns.
Example: Compromised Credential
#![allow(unused)] fn main() { // If an attacker has this credential, what can they access? let result = graph .blast_radius(credential_id) .add_attack_edge("ALLOWS", Direction::Outgoing) // credential → policies .add_attack_edge("ASSIGNED", Direction::Incoming) // who was assigned this .add_attack_edge("HAS", Direction::Incoming) // who owns this .max_depth(5) .analyze(); // Check if any datastores are in the blast radius let exposed_datastores: Vec<_> = result.high_value_targets.iter() .filter(|e| e._class.as_str() == "DataStore") .collect(); if !exposed_datastores.is_empty() { println!("CRITICAL: {} datastores exposed", exposed_datastores.len()); } }
Example: PQL Equivalent
The blast radius builder corresponds to PQL's FIND BLAST RADIUS FROM syntax:
-- PQL
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4
-- Rust equivalent
graph.blast_radius(host_id).max_depth(4).analyze()
Performance
Blast radius uses BFS internally and visits each entity at most once. For a graph with 100K entities and average degree 10, expect:
- Depth 2: ~100 entities visited, <1ms
- Depth 4: ~10,000 entities visited, ~10ms
- Depth 6: ~100,000 entities visited, ~100ms
Set max_depth conservatively — attacker lateral movement rarely exceeds
4-6 hops in practice.
Coverage Gap
Coverage gap analysis finds entities that are missing a qualifying neighbor. This answers questions like:
- "Which hosts have no EDR agent protecting them?"
- "Which services have no scanner scanning them?"
- "Which databases have no backup agent connected to them?"
Basic Usage
#![allow(unused)] fn main() { let graph = GraphReader::new(&snap); // Find all hosts with no EDR agent protecting them let unprotected = graph .coverage_gap("PROTECTS") .target_type("host") .neighbor_type("edr_agent") .find(); println!("{} hosts have no EDR coverage", unprotected.len()); }
CoverageGapBuilder
#![allow(unused)] fn main() { impl<'snap> CoverageGapBuilder<'snap> { /// The relationship verb that represents coverage. /// e.g., "PROTECTS", "SCANS", "MANAGES" /// (This is the first argument to coverage_gap()) /// The type of entity to check for coverage gaps. pub fn target_type(self, t: &str) -> Self; /// The class of entity that provides coverage. pub fn target_class(self, c: &str) -> Self; /// The type of entity that provides coverage (the "covering" entity). pub fn neighbor_type(self, t: &str) -> Self; /// The class of the covering entity. pub fn neighbor_class(self, c: &str) -> Self; /// Direction of the coverage edge (default: Incoming — neighbor → target). pub fn direction(self, dir: Direction) -> Self; /// Execute and return all entities with no qualifying neighbor. pub fn find(self) -> Vec<&'snap Entity>; } }
INV-G06
INV-G06: Coverage gap only returns entities of target_type that have
no qualifying neighbor via the specified verb. Entities that have at least
one qualifying neighbor are excluded.
Examples
Scanner coverage
#![allow(unused)] fn main() { // Which hosts have never been scanned? let unscanned = graph .coverage_gap("SCANS") .target_type("host") .neighbor_class("Scanner") .direction(Direction::Incoming) // scanner → host .find(); }
EDR protection
#![allow(unused)] fn main() { // Which containers have no security agent? let unprotected = graph .coverage_gap("PROTECTS") .target_class("Container") .neighbor_class("Agent") .find(); }
Backup coverage
#![allow(unused)] fn main() { // Which databases have no backup relationship? let unbackedup = graph .coverage_gap("HAS") .target_class("Database") .neighbor_type("backup_job") .find(); }
PQL Equivalent
Coverage gap corresponds to PQL's negated traversal:
-- PQL: find hosts with no EDR protection
FIND host THAT !PROTECTS edr_agent
-- Rust equivalent
graph.coverage_gap("PROTECTS").target_type("host").neighbor_type("edr_agent").find()
Performance
Coverage gap requires:
- Fetch all entities of
target_type(index scan) - For each entity, check if any
verbedge exists to a qualifying neighbor (adjacency lookup)
The adjacency index makes step 2 O(degree) per entity, not O(n_relationships). For 10,000 hosts with average degree 5, expect:
- 10,000 adjacency lookups × O(5) = 50,000 operations
- Typically completes in <50ms
PQL — Parallax Query Language
PQL is the read-only query language for Parallax. It is designed for security practitioners who are not graph database experts: readable in plain English, learnable in 10 minutes, and predictable in performance.
Design Goals
| Goal | How |
|---|---|
| Readable by non-engineers | English-like: FIND, THAT, WITH, ALLOWS |
| Learnable in 10 minutes | Core syntax is 5 clauses; no joins, no subqueries in v0.1 |
| Predictable performance | Every query maps to a known graph operation |
| Machine-parseable | Clean grammar → easy for AI to generate PQL from natural language |
Non-Goals
PQL is read-only. All writes go through the ingest API. There is no INSERT, UPDATE, DELETE, or MERGE in PQL.
PQL is not a general-purpose graph query language. No arbitrary pattern matching with anonymous nodes, no recursive CTEs, no graph algorithms in the language itself.
Core Syntax
Every PQL query is one of three forms:
-- 1. Entity query (most common)
FIND <entity_filter>
[WITH <property_filters>]
[THAT <traversal_chain>]
[RETURN <projection>]
[LIMIT <n>]
-- 2. Shortest path query
FIND SHORTEST PATH
FROM <entity_filter> [WITH <property_filters>]
TO <entity_filter> [WITH <property_filters>]
[DEPTH <n>]
-- 3. Blast radius query
FIND BLAST RADIUS
FROM <entity_filter> [WITH <property_filters>]
[DEPTH <n>]
The Five Clauses
FIND
Specifies which entities to start with. The argument is either:
- An entity type: specific (e.g.,
host,aws_ec2_instance) - An entity class: broad (e.g.,
Host,User,DataStore) *for any entity
FIND host
FIND Host
FIND aws_ec2_instance
FIND *
WITH
Filters entities by property values. Multiple conditions are combined with AND.
FIND host WITH state = 'running'
FIND host WITH state = 'running' AND region = 'us-east-1'
FIND user WITH active = true AND email LIKE '@corp.com'
THAT
Traverses relationships. Can be chained for multi-hop queries. Supports
negation with ! to find coverage gaps.
FIND host THAT RUNS service
FIND user THAT ASSIGNED role THAT ALLOWS s3_bucket
FIND host THAT !PROTECTS edr_agent -- hosts with no EDR
RETURN
Specifies output format. Defaults to full entity objects.
FIND host RETURN COUNT -- count only
FIND host RETURN display_name, state -- specific properties
LIMIT
Limits the number of results returned.
FIND host LIMIT 100
FIND host WITH state = 'running' LIMIT 10
Quick Reference
-- All running hosts
FIND host WITH state = 'running'
-- All services on running hosts
FIND host WITH state = 'running' THAT RUNS service
-- Hosts with no EDR
FIND host THAT !PROTECTS edr_agent
-- Count of all hosts
FIND host RETURN COUNT
-- Shortest path from user to secret
FIND SHORTEST PATH FROM user WITH email = 'alice@corp.com'
TO secret WITH name = 'prod-db-password'
-- Blast radius from compromised host
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4
See Syntax Reference for the complete grammar, and Examples for real-world query patterns.
PQL Syntax Reference
Formal Grammar (EBNF)
query
= find_query
| path_query
| blast_query
;
find_query
= "FIND" entity_filter
("WITH" property_expr)?
("THAT" traversal_chain)?
("GROUP" "BY" identifier)?
("RETURN" return_clause)?
("LIMIT" integer)?
;
path_query
= "FIND" "SHORTEST" "PATH"
"FROM" entity_filter ("WITH" property_expr)?
"TO" entity_filter ("WITH" property_expr)?
("DEPTH" integer)?
;
blast_query
= "FIND" "BLAST" "RADIUS"
"FROM" entity_filter ("WITH" property_expr)?
("DEPTH" integer)?
;
entity_filter
= identifier (* type: "host", "aws_ec2_instance" *)
| "*" (* any entity *)
;
traversal_chain
= traversal_step ("THAT" traversal_step)*
;
traversal_step
= negation? verb entity_filter ("WITH" property_expr)?
;
negation = "!" ;
verb
= "HAS" | "IS" | "ASSIGNED" | "ALLOWS" | "USES"
| "CONTAINS" | "MANAGES" | "CONNECTS" | "PROTECTS"
| "EXPLOITS" | "TRUSTS" | "SCANS" | "RUNS" | "READS" | "WRITES"
;
property_expr
= property_or ("AND" property_or)*
;
property_or
= property_cond ("OR" property_cond)*
;
property_cond
= identifier comparison_op value
| identifier "IN" "(" value_list ")"
| identifier "LIKE" string_literal
| identifier "EXISTS"
| "NOT" property_cond
;
comparison_op = "=" | "!=" | "<" | "<=" | ">" | ">=" ;
return_clause
= "COUNT"
| identifier ("," identifier)*
;
group_by_clause
= "GROUP" "BY" identifier
;
value
= string_literal (* 'single quoted' *)
| integer (* 42 *)
| float (* 3.14 *)
| "true" | "false"
| "null"
;
value_list
= value ("," value)*
;
identifier
= [a-zA-Z_][a-zA-Z0-9_]*
;
string_literal
= "'" [^']* "'" (* single quotes ONLY — no double quotes *)
;
Important Syntax Notes
String Literals Use Single Quotes
PQL uses single-quoted string literals. Double quotes are not supported.
-- Correct
FIND host WITH state = 'running'
-- Wrong (parser error)
FIND host WITH state = "running"
Keywords Are Case-Sensitive
PQL keywords (FIND, WITH, THAT, AND, RETURN, LIMIT, etc.) must
be uppercase. Identifiers (entity types, property names) are case-sensitive
as well.
-- Correct
FIND host WITH state = 'running'
-- Wrong
find Host where State = 'running'
Note: Entity classes are PascalCase (Host, User, DataStore). Entity
types are snake_case (host, aws_ec2_instance).
Negation in Traversal
The ! negation before a verb finds entities that do not have a matching
neighbor. It cannot be chained.
-- Valid: hosts with no EDR agent
FIND host THAT !PROTECTS edr_agent
-- Invalid: cannot chain after negation
FIND host THAT !PROTECTS edr_agent THAT HAS service -- syntax error
RETURN Clause
Without RETURN, full entity objects are returned. With RETURN COUNT,
only the count is returned (no entity data). With RETURN field1, field2,
only the specified property fields are included.
-- Returns full entity objects
FIND host
-- Returns only the count
FIND host RETURN COUNT
-- Returns entities with only display_name and state properties
FIND host RETURN display_name, state
DEPTH in Path Queries
DEPTH limits the maximum number of hops to explore. Without it, the search
is unbounded (but bounded by graph diameter in practice).
FIND SHORTEST PATH FROM user TO secret DEPTH 6
FIND BLAST RADIUS FROM host DEPTH 4
Tokenization Rules
| Token | Rule |
|---|---|
| Keywords | FIND, WITH, THAT, RETURN, LIMIT, AND, OR, NOT, IN, LIKE, EXISTS, GROUP, BY, SHORTEST, PATH, FROM, TO, BLAST, RADIUS, DEPTH, COUNT |
| Verbs | HAS, IS, ASSIGNED, ALLOWS, USES, CONTAINS, MANAGES, CONNECTS, PROTECTS, EXPLOITS, TRUSTS, SCANS, RUNS, READS, WRITES |
| Identifiers | [a-zA-Z_][a-zA-Z0-9_]* |
| String | '[^']*' |
| Integer | [0-9]+ |
| Float | [0-9]+\.[0-9]+ |
| Boolean | true, false |
| Null | null |
| Operators | =, !=, <, <=, >, >= |
| Negation | ! |
| Punctuation | (, ), , |
| Wildcard | * |
| Whitespace | Ignored |
Property Filters
The WITH clause filters entities by their properties. All conditions in a
WITH clause are combined with AND.
Comparison Operators
-- Equality
FIND host WITH state = 'running'
-- Inequality
FIND host WITH state != 'terminated'
-- Numeric comparisons
FIND host WITH cpu_count > 4
FIND host WITH memory_gb >= 32
FIND host WITH score < 7.5
FIND host WITH version <= 3
String Matching
-- Exact match (case-sensitive)
FIND user WITH email = 'alice@corp.com'
-- Pattern match with LIKE (% = any sequence, _ = any single char)
FIND user WITH email LIKE '%@corp.com'
FIND host WITH hostname LIKE 'web-%'
Boolean
FIND host WITH active = true
FIND aws_s3_bucket WITH public = false
FIND user WITH mfa_enabled = false
Null Checks
-- Field has no value
FIND container WITH cpu_limit = null
-- Field has any value (EXISTS)
FIND host WITH owner EXISTS
-- Field has no value (NOT EXISTS equivalent)
FIND host WITH NOT owner EXISTS
IN List
-- State is one of the listed values
FIND host WITH state IN ('running', 'pending', 'starting')
-- Region is one of the listed values
FIND host WITH region IN ('us-east-1', 'us-west-2', 'eu-west-1')
NOT
-- Negate any condition
FIND host WITH NOT state = 'terminated'
FIND user WITH NOT mfa_enabled = true
FIND host WITH NOT region IN ('us-east-1', 'us-west-2')
Multiple Conditions (AND)
Multiple conditions in a single WITH clause are combined with AND:
FIND host WITH state = 'running' AND region = 'us-east-1'
FIND user WITH active = true AND mfa_enabled = false
FIND host WITH env = 'production' AND cpu_count >= 8 AND state = 'running'
OR Conditions
Use OR to match any of several values for the same property:
-- Hosts running linux or windows
FIND host WITH os = 'linux' OR os = 'windows'
-- Three alternatives
FIND host WITH region = 'us-east-1' OR region = 'eu-west-1' OR region = 'ap-southeast-1'
Combine OR with AND using the natural precedence — OR binds tighter than AND:
-- (os = linux OR os = windows) AND env = prod
FIND host WITH os = 'linux' OR os = 'windows' AND env = 'prod'
GROUP BY
Aggregate query results by a property field:
-- Count hosts by OS
FIND host GROUP BY os
-- Count hosts by region, filtered to running state
FIND host WITH state = 'running' GROUP BY region
GROUP BY returns QueryResult::Grouped — a list of (value, count) pairs,
one per distinct value of the field. Entities missing the field land in their
own group (Value::Null).
Traversal Filters
WITH can also appear after a traversal verb to filter the target entity:
-- Running hosts that run a specific service
FIND host WITH state = 'running' THAT RUNS service WITH name = 'nginx'
-- Users assigned to admin roles (filter both ends)
FIND user WITH active = true THAT ASSIGNED role WITH admin = true
Value Type Reference
| Value Type | Example | Notes |
|---|---|---|
| String | 'running' | Single quotes. No double quotes. |
| Integer | 42, 100 | No quotes. |
| Float | 3.14, 0.5 | Decimal point required. |
| Boolean | true, false | Lowercase. |
| Null | null | Lowercase. |
Case Sensitivity
- Property names are case-sensitive:
state≠State - String values are case-sensitive:
'Running'≠'running' - Keywords (
WITH,AND,NOT,IN,LIKE,EXISTS) are uppercase
Special Property Names
These properties are always present on every entity:
| Property | Description |
|---|---|
display_name | Human-readable name |
_type | Entity type string (e.g., "host") |
_class | Entity class string (e.g., "Host") |
_key | Source-system key |
_deleted | Soft-delete flag (always false in query results) |
Traversal Queries
The THAT clause traverses relationships from the entities found by FIND.
Basic Traversal
-- Find hosts that run services
FIND host THAT RUNS service
-- Find users that have been assigned roles
FIND user THAT ASSIGNED role
-- Find security groups that allow internet traffic
FIND security_group THAT ALLOWS internet
Multi-Hop Traversal
Chain multiple THAT clauses for multi-hop queries:
-- user → role → policy → resource (3 hops)
FIND user THAT ASSIGNED role THAT ALLOWS policy THAT USES aws_s3_bucket
Each THAT clause adds one hop. The entity filter after each verb narrows
which entities at that hop qualify.
Filtering Traversal Targets
Add WITH after a traversal verb to filter the entities at that hop:
-- Users assigned to admin roles specifically
FIND user THAT ASSIGNED role WITH admin = true
-- Hosts running services that are publicly exposed
FIND host THAT RUNS service WITH public = true
-- Multi-hop with filters at each step
FIND user WITH active = true
THAT ASSIGNED role WITH admin = true
THAT ALLOWS aws_s3_bucket WITH public = true
Negated Traversal (Coverage Gaps)
Prefix a verb with ! to find entities that do not have a qualifying
neighbor via that relationship:
-- Hosts with no EDR agent protecting them
FIND host THAT !PROTECTS edr_agent
-- Services with no firewall
FIND service THAT !PROTECTS firewall
-- Users who have never been assigned any role
FIND user THAT !ASSIGNED role
Note: Negation (!) cannot appear in a chained traversal. It must be
the final THAT step.
-- Invalid: cannot chain after negation
FIND host THAT !PROTECTS edr_agent THAT HAS service -- syntax error
Available Verbs
PQL supports 15 relationship verbs:
| Verb | Semantic Meaning |
|---|---|
HAS | Ownership or containment (account HAS bucket) |
IS | Identity or equivalence (user IS person) |
ASSIGNED | Role or permission assignment (user ASSIGNED role) |
ALLOWS | Network or access permission (policy ALLOWS resource) |
USES | Active dependency (service USES database) |
CONTAINS | Logical grouping (vpc CONTAINS subnet) |
MANAGES | Administrative control (team MANAGES repo) |
CONNECTS | Network connectivity (vpc CONNECTS vpc) |
PROTECTS | Security coverage (edr PROTECTS host) |
EXPLOITS | Vulnerability relationship (cve EXPLOITS package) |
TRUSTS | Trust relationship (account TRUSTS account) |
SCANS | Scanner coverage (scanner SCANS host) |
RUNS | Process or service execution (host RUNS service) |
READS | Data access (read) (app READS database) |
WRITES | Data access (write) (app WRITES database) |
Traversal Direction
PQL traversal follows edges in both directions by default when using the entity-type filter form. To control direction, use the Rust API directly or rely on the verb semantics:
FIND host THAT RUNS service— followsRUNSedges outgoing from hostFIND service THAT RUNS host— followsRUNSedges incoming to service (i.e., which hosts run this service)
The query executor determines direction based on the verb and entity order.
How Traversal Maps to Graph Operations
| PQL | Graph Operation |
|---|---|
FIND A THAT V B | Find all A; for each, traverse V edges to B |
FIND A THAT !V B | Find all A that have no V edges to B |
FIND A THAT V B THAT W C | Find A→B→C chains |
FIND SHORTEST PATH FROM A TO B | BFS bidirectional from A and B |
FIND BLAST RADIUS FROM A DEPTH n | BFS from A up to n hops |
Path Queries
Parallax supports two special query forms for path-based analysis.
Shortest Path
FIND SHORTEST PATH finds the minimum-hop chain of relationships between
two specific entities.
Syntax
FIND SHORTEST PATH
FROM <entity_filter> [WITH <property_filters>]
TO <entity_filter> [WITH <property_filters>]
[DEPTH <max_hops>]
Examples
-- Is there any connection between a user and a secret?
FIND SHORTEST PATH
FROM user WITH email = 'alice@corp.com'
TO secret WITH name = 'prod-db-password'
-- Privilege escalation path from a guest account to admin
FIND SHORTEST PATH
FROM user WITH role = 'guest'
TO role WITH admin = true
DEPTH 6
-- Network path between two VPCs
FIND SHORTEST PATH
FROM aws_vpc WITH _key = 'vpc-prod'
TO aws_vpc WITH _key = 'vpc-dev'
DEPTH 10
Response
If a path exists, the response includes:
- The full sequence of entities in the path
- The relationships connecting them
- The total number of hops
If no path exists within DEPTH hops (or at all), the response has an
empty path with count: 0.
Performance
Shortest path uses bidirectional BFS — exploring from both endpoints simultaneously. This is significantly faster than unidirectional BFS for deep graphs:
| Graph Size | Path Length | Typical Latency |
|---|---|---|
| 10K entities | 4 hops | <5ms |
| 100K entities | 6 hops | <50ms |
| 1M entities | 8 hops | <500ms |
Blast Radius
FIND BLAST RADIUS computes the set of entities reachable from a starting
point via attacker-relevant relationships.
Syntax
FIND BLAST RADIUS
FROM <entity_filter> [WITH <property_filters>]
[DEPTH <max_hops>]
Examples
-- What is at risk if this host is compromised?
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4
-- What can an attacker reach from this credential?
FIND BLAST RADIUS FROM credential WITH name = 'prod-api-key' DEPTH 5
-- Impact of a vulnerable package in production
FIND BLAST RADIUS FROM package WITH cve = 'CVE-2024-1234' DEPTH 3
Response
The blast radius response includes:
impacted: all entities reachable within the depth limithigh_value_targets: entities of high-value classes (DataStore, Secret, etc.)critical_paths: specific paths to high-value targetscount: total number of impacted entities
High-Value Target Classes
The following entity classes are always flagged as high-value targets in blast radius results:
DataStore, Secret, Key, Database, Credential, Certificate, Identity, Account
Depth Guidance
| Depth | Coverage | Use Case |
|---|---|---|
| 2 | Immediate neighbors | Quick triage |
| 4 | Typical blast radius | Most analyses |
| 6 | Extended blast radius | Comprehensive analysis |
| 8+ | Near-full graph | Use sparingly — can be slow |
Default depth is 4 if not specified.
Via REST API
# Shortest path
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND SHORTEST PATH FROM user WITH email = '\''alice@corp.com'\'' TO secret WITH name = '\''prod-db-password'\''"}'
# Blast radius
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND BLAST RADIUS FROM host WITH _key = '\''web-01'\'' DEPTH 4"}'
Query Execution
PQL queries go through a three-stage pipeline: parse → plan → execute.
Stage 1: Parse
The PQL lexer tokenizes the input string, then the recursive descent parser produces an AST.
#![allow(unused)] fn main() { use parallax_query::parse; let ast = parse("FIND host WITH state = 'running'")?; }
The parser is hand-written (no parser generator) for several reasons:
- Precise error messages: "Expected '=' after property name 'state', got '<'"
- Zero additional dependencies
- Full control over error recovery
INV-Q01: The same query string always produces the same AST (deterministic).
Parse Errors
Parse errors include the position of the unexpected token:
Error: unexpected token '=' at position 18
FIND host WITH state = = 'running'
^
expected: comparison operator (=, !=, <, <=, >, >=)
Stage 2: Plan
The planner transforms the AST into a QueryPlan — a concrete execution
strategy. It consults IndexStats to choose the most efficient access path.
#![allow(unused)] fn main() { use parallax_query::{parse, plan}; let ast = parse("FIND host WITH state = 'running'")?; let query_plan = plan(&ast, &index_stats)?; }
Index Stats
IndexStats tracks entity counts by type and class:
#![allow(unused)] fn main() { pub struct IndexStats { pub type_counts: HashMap<String, u64>, // "host" → 1234 pub class_counts: HashMap<String, u64>, // "Host" → 5678 pub entity_count: u64, pub relationship_count: u64, } }
Access Strategy Selection
| Query Pattern | Chosen Strategy | Reason |
|---|---|---|
FIND host | TypeIndexScan | Type index available |
FIND Host | ClassIndexScan | Class index available |
FIND * | FullScan | No narrowing possible |
FIND host WITH state='running' | TypeIndexScan + filter | Type index reduces candidates |
Stage 3: Execute
The executor runs the query plan against an MVCC snapshot:
#![allow(unused)] fn main() { use parallax_query::execute; let snap = engine.snapshot(); let result = execute(&query_plan, &snap, &limits)?; }
QueryLimits
#![allow(unused)] fn main() { pub struct QueryLimits { pub max_results: usize, // default: 10_000 pub timeout: Duration, // default: 30s pub max_traversal_depth: usize, // default: 10 } }
INV-Q03: max_results is a hard cap — the query returns at most this
many results.
INV-Q04: If the query takes longer than timeout, it returns an error,
not a partial result.
QueryResult
#![allow(unused)] fn main() { pub enum QueryResult { Entities(Vec<Entity>), Traversals(Vec<TraversalResult>), Paths(Vec<GraphPath>), Scalar(u64), // For RETURN COUNT } }
Execution Plan Examples
FIND host WITH state = 'running'
QueryPlan::Find {
access: TypeIndexScan { entity_type: "host" },
filters: [PropertyFilter { key: "state", op: Eq, value: "running" }],
return_: ReturnAll,
limit: None,
}
Execution:
- Load type index entry for
"host"→[EntityId1, EntityId2, ...] - For each ID, fetch entity from snapshot
- Apply property filter
state = 'running' - Return matching entities
FIND host THAT RUNS service
QueryPlan::Traversal {
start: Find { access: TypeIndexScan { "host" }, filters: [] },
steps: [TraversalStep { verb: "RUNS", target_type: "service", negated: false }],
return_: ReturnAll,
}
Execution:
- Load all hosts from type index
- For each host, follow outgoing
RUNSedges via adjacency index - Filter targets to
entity_type == "service" - Return matching host + service pairs
FIND host RETURN COUNT
QueryPlan::Find {
access: TypeIndexScan { "host" },
filters: [],
return_: ReturnCount,
limit: None,
}
Execution:
- Load type index entry for
"host" - Count entries without fetching entity data
- Return scalar count
Performance Characteristics
| Query Type | Typical Latency (10K entities) |
|---|---|
| Type scan, no filter | <100μs |
| Type scan + property filter | <1ms |
| Single-hop traversal | <500μs |
| 3-hop traversal | <5ms |
| Shortest path | <10ms |
| Blast radius (depth 4) | <10ms |
RETURN COUNT | <50μs (index only) |
PQL Examples
Real-world query patterns organized by use case.
Asset Discovery
-- All hosts (any type: physical, VM, container)
FIND Host
-- All running cloud instances
FIND host WITH state = 'running'
-- All hosts in a specific region
FIND host WITH region = 'us-east-1'
-- All public-facing S3 buckets
FIND aws_s3_bucket WITH public = true
-- Count all entities by class
FIND Host RETURN COUNT
FIND User RETURN COUNT
FIND DataStore RETURN COUNT
Vulnerability Analysis
-- Find all vulnerable packages
FIND package WITH has_vulnerability = true
-- Find hosts running vulnerable software
FIND host THAT RUNS package WITH has_vulnerability = true
-- Packages that exploit a specific CVE class
FIND package THAT EXPLOITS vulnerability WITH severity = 'critical'
Access and Privilege
-- All users with admin roles
FIND user THAT ASSIGNED role WITH name = 'Administrator'
-- All policies that allow S3 access
FIND policy THAT ALLOWS aws_s3_bucket
-- Users who can access a specific bucket (indirect — via roles and policies)
FIND user THAT ASSIGNED role THAT ALLOWS aws_s3_bucket WITH public = false
-- Service accounts with broad permissions
FIND service_account THAT ASSIGNED role WITH permissions = 'all'
Network Connectivity
-- All services exposed to the internet
FIND service THAT CONNECTS internet
-- Hosts that allow inbound traffic from any IP
FIND firewall WITH allow_all = true THAT PROTECTS host
-- Internal services with no firewall
FIND service THAT !PROTECTS firewall
Security Coverage Gaps
-- Hosts with no EDR agent
FIND host THAT !PROTECTS edr_agent
-- Hosts never scanned
FIND host THAT !SCANS scanner
-- Services with no firewall protection
FIND service THAT !PROTECTS firewall
-- Databases with no backup
FIND Database THAT !HAS backup_job
Container and Cloud Native
-- All pods running as root
FIND pod WITH run_as_root = true
-- Containers with no resource limits
FIND container WITH cpu_limit = null
-- Clusters with nodes that have outdated Kubernetes versions
FIND cluster THAT CONTAINS node WITH k8s_version < '1.28.0'
-- Namespaces with privileged pods
FIND namespace THAT CONTAINS pod WITH privileged = true
Path and Reachability
-- Shortest path from a user to a secret
FIND SHORTEST PATH
FROM user WITH email = 'alice@corp.com'
TO secret WITH name = 'prod-db-password'
-- Shortest path between two networks
FIND SHORTEST PATH
FROM aws_vpc WITH _key = 'vpc-prod'
TO aws_vpc WITH _key = 'vpc-dev'
DEPTH 10
-- Is there any connection between these systems?
FIND SHORTEST PATH
FROM host WITH hostname = 'web-01'
TO DataStore WITH name = 'customer-data'
Blast Radius
-- If web-01 is compromised, what's at risk?
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4
-- Blast radius from a compromised credential
FIND BLAST RADIUS FROM credential WITH name = 'prod-api-key' DEPTH 5
-- Blast radius from a vulnerable package
FIND BLAST RADIUS FROM package WITH cve = 'CVE-2024-1234' DEPTH 3
Multi-hop Traversal
-- Complete access chain: user → role → policy → resource
FIND user
THAT ASSIGNED role
THAT ALLOWS policy
THAT USES aws_s3_bucket
-- Application stack: user → app → service → database
FIND user
THAT USES application
THAT CONNECTS service
THAT USES database WITH environment = 'production'
-- Supply chain: package → application → host
FIND package WITH has_vulnerability = true
THAT USES application
THAT RUNS host WITH environment = 'production'
Compliance Queries
-- CIS: All admin users (should be minimal)
FIND user THAT ASSIGNED role WITH admin = true RETURN COUNT
-- PCI-DSS: Services that touch cardholder data
FIND service THAT USES database WITH contains_card_data = true
-- NIST: Hosts without encryption at rest
FIND host WITH encryption_at_rest = false
-- All external-facing services without WAF
FIND service WITH external = true THAT !PROTECTS waf
Connector SDK Overview
The parallax-connect crate is the extension surface of Parallax — the part
third-party developers touch most. It defines the Connector trait and
provides all the infrastructure for collecting and publishing data.
What a Connector Does
A connector bridges an external data source (AWS, Okta, GitHub, a scanner, etc.) and the Parallax graph. It:
- Authenticates with the external API
- Collects entities and relationships (in parallel steps)
- Emits them via the SDK
- The SDK handles diffing, validation, and atomic commit
The connector author never writes to the storage engine directly. They
implement the Connector trait and call ctx.emit_entity() /
ctx.emit_relationship() — the framework handles the rest.
The Lifecycle
┌──────────┐ ┌───────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐
│ Configure │──►│ Discover │──►│ Collect │──►│ Publish │──►│ Commit │
│(auth, │ │(validate │ │(fetch │ │ (SDK diff │ │ (atomic │
│ settings) │ │ creds) │ │ assets) │ │ + queue) │ │ write) │
└──────────┘ └───────────┘ └──────────┘ └───────────┘ └──────────┘
Steps 1–4 are in the connector. Step 5 (commit) is handled by the scheduler
or the caller. The separation is deliberate: parallax-connect has no
dependency on parallax-store, keeping the dependency graph acyclic.
Quick Example
#![allow(unused)] fn main() { use parallax_connect::prelude::*; pub struct MyConnector; #[async_trait] impl Connector for MyConnector { fn name(&self) -> &str { "my-connector" } fn steps(&self) -> Vec<StepDefinition> { vec![ step("hosts", "Collect hosts").build(), step("services", "Collect services") .depends_on(&["hosts"]) .build(), ] } async fn execute_step( &self, step_id: &str, ctx: &mut StepContext, ) -> Result<(), ConnectorError> { match step_id { "hosts" => { ctx.emit_entity( entity("host", "web-01") .class("Host") .display_name("Web Server 01") .property("state", "running") )?; Ok(()) } "services" => { ctx.emit_entity( entity("service", "nginx") .class("Service") .display_name("Nginx") )?; ctx.emit_relationship( relationship("host", "web-01", "RUNS", "service", "nginx") )?; Ok(()) } _ => Err(ConnectorError::UnknownStep(step_id.to_string())), } } } }
Running a Connector
#![allow(unused)] fn main() { use parallax_connect::run_connector; use parallax_ingest::commit_sync_exclusive; let output = run_connector(&MyConnector, "my-account", "sync-001", None).await?; let result = commit_sync_exclusive( &mut engine, &output.connector_id, &output.sync_id, output.entities, output.relationships, )?; println!("Created: {}", result.stats.entities_created); println!("Updated: {}", result.stats.entities_updated); println!("Deleted: {}", result.stats.entities_deleted); }
Or with SyncEngine for shared engine access:
#![allow(unused)] fn main() { use parallax_connect::run_connector; use parallax_ingest::SyncEngine; let output = run_connector(&connector, "my-account", "sync-002", Some(&event_tx)).await?; sync_engine.commit_sync( &output.connector_id, &output.sync_id, output.entities, output.relationships, )?; }
Key Principles
- Idempotent: Running the same connector twice with the same source data produces no changes (entities_unchanged = n, created = 0, deleted = 0).
- Source-scoped: Connector A's data is never deleted by connector B's sync.
- Atomic: Either the entire sync batch lands or none of it does.
- Fault-tolerant: A failed step does not prevent independent steps from running.
See Writing a Connector for the full guide.
Writing a Connector
A complete guide to implementing the Connector trait.
1. Create a Crate
Connectors are separate crates depending on parallax-connect:
# Cargo.toml
[package]
name = "connector-myservice"
version = "0.1.0"
edition = "2021"
[dependencies]
parallax-connect = { path = "../parallax-connect" }
async-trait = "0.1"
tokio = { version = "1", features = ["full"] }
# ... your API client crate
2. Define the Struct
#![allow(unused)] fn main() { use parallax_connect::prelude::*; pub struct MyServiceConnector { api_base_url: String, api_token: String, } impl MyServiceConnector { pub fn new(api_base_url: impl Into<String>, api_token: impl Into<String>) -> Self { MyServiceConnector { api_base_url: api_base_url.into(), api_token: api_token.into(), } } } }
3. Define Steps
Steps are the units of collection. Each step is independent or depends on
prior steps. Define them in the steps() method:
#![allow(unused)] fn main() { #[async_trait] impl Connector for MyServiceConnector { fn name(&self) -> &str { "my-service" } fn steps(&self) -> Vec<StepDefinition> { vec![ step("users", "Collect users").build(), step("hosts", "Collect hosts").build(), step("services", "Collect services") .depends_on(&["hosts"]) // runs after "hosts" .build(), step("relationships", "Collect relationships") .depends_on(&["users", "hosts", "services"]) // runs last .build(), ] } // ... } }
INV-C05: Step dependencies must form a DAG. Cycles are rejected at connector load time.
INV-C06: A failed step does not prevent independent steps from running. Steps that don't depend on the failed step still execute.
4. Implement Steps
#![allow(unused)] fn main() { #[async_trait] impl Connector for MyServiceConnector { // ... name() and steps() ... async fn execute_step( &self, step_id: &str, ctx: &mut StepContext, ) -> Result<(), ConnectorError> { match step_id { "users" => self.collect_users(ctx).await, "hosts" => self.collect_hosts(ctx).await, "services" => self.collect_services(ctx).await, "relationships" => self.collect_relationships(ctx).await, _ => Err(ConnectorError::UnknownStep(step_id.to_string())), } } } }
5. Emit Entities
#![allow(unused)] fn main() { impl MyServiceConnector { async fn collect_users(&self, ctx: &mut StepContext) -> Result<(), ConnectorError> { let users = self.fetch_users_from_api().await?; for user in users { ctx.emit_entity( entity("user", &user.id) // (type, key) .class("User") // entity class .display_name(&user.name) .property("email", user.email.as_str()) .property("active", user.active) .property("mfa_enabled", user.mfa_enabled) )?; } Ok(()) } } }
ctx.emit_entity() returns Result<(), ConnectorError>. Emit errors are
non-fatal by default — they log a warning but don't stop the step.
To make them fatal, propagate with ?.
6. Emit Relationships
#![allow(unused)] fn main() { async fn collect_relationships( &self, ctx: &mut StepContext, ) -> Result<(), ConnectorError> { let assignments = self.fetch_role_assignments().await?; for assignment in assignments { ctx.emit_relationship( relationship( "user", // from_type &assignment.user_id, // from_key "ASSIGNED", // verb "role", // to_type &assignment.role_id, // to_key ) .property("assigned_at", assignment.timestamp.to_string().as_str()) )?; } Ok(()) } }
INV-C04: Referential integrity is enforced at commit time. A relationship
whose from_key or to_key doesn't exist in the batch or the current graph
will be rejected. Emit entities before the relationships that reference them.
7. Access Prior Step Data
Steps can read entities emitted by their dependencies:
#![allow(unused)] fn main() { async fn collect_services( &self, ctx: &mut StepContext, ) -> Result<(), ConnectorError> { // Read hosts emitted by the "hosts" step let host_ids: Vec<String> = ctx.prior .entities_by_type("host") .iter() .map(|e| e.entity_key.to_string()) .collect(); // Use host IDs to fetch services from the API for host_id in host_ids { let services = self.fetch_services_for_host(&host_id).await?; for service in services { ctx.emit_entity( entity("service", &service.id) .class("Service") .display_name(&service.name) )?; } } Ok(()) } }
8. Error Handling
#![allow(unused)] fn main() { use parallax_connect::ConnectorError; // For unrecognized step IDs (always include this) Err(ConnectorError::UnknownStep(step_id.to_string())) // For API errors Err(ConnectorError::Custom(format!("API error: {}", response.status()))) // For configuration errors Err(ConnectorError::Configuration("API token is empty".to_string())) }
Connector errors are logged and reported in SyncEvent::StepFailed.
They do not abort the entire sync — independent steps still run.
9. Run the Connector
#[tokio::main] async fn main() -> anyhow::Result<()> { let connector = MyServiceConnector::new( "https://api.myservice.com", std::env::var("MYSERVICE_TOKEN")?, ); let mut engine = StorageEngine::open(StoreConfig::new("./data"))?; let output = parallax_connect::run_connector( &connector, "my-account-id", // account_id "sync-001", // sync_id (unique per run) None, // event_tx (optional observability channel) ).await?; let result = parallax_ingest::commit_sync_exclusive( &mut engine, &output.connector_id, &output.sync_id, output.entities, output.relationships, )?; println!("Sync complete:"); println!(" Created: {}", result.stats.entities_created); println!(" Updated: {}", result.stats.entities_updated); println!(" Deleted: {}", result.stats.entities_deleted); println!(" Relationships created: {}", result.stats.relationships_created); Ok(()) }
Step Definitions
Steps are the units of collection within a connector. They enable parallel execution of independent collection tasks and sequential execution of dependent tasks.
Defining Steps
#![allow(unused)] fn main() { fn steps(&self) -> Vec<StepDefinition> { vec![ // Independent steps (no dependencies — can run in parallel in future) step("users", "Collect IAM users").build(), step("roles", "Collect IAM roles").build(), step("hosts", "Collect EC2 instances").build(), // Dependent steps (run after their dependencies complete) step("policies", "Collect IAM policies") .depends_on(&["roles"]) .build(), step("services", "Collect services on hosts") .depends_on(&["hosts"]) .build(), // Step that depends on multiple prior steps step("relationships", "Wire all relationships") .depends_on(&["users", "roles", "policies", "hosts", "services"]) .build(), ] } }
StepDefinition Builder
#![allow(unused)] fn main() { // Start a step definition step(id: &str, description: &str) // Methods .depends_on(step_ids: &[&str]) -> Self // declare dependencies .build() -> StepDefinition // finalize }
Execution Order — Parallel Waves
The scheduler groups steps into topological waves. Steps within the same
wave have no inter-dependencies and execute concurrently via
tokio::task::JoinSet. Waves execute sequentially so downstream steps always
see completed upstream data.
Given: users, roles, hosts, policies(→roles), services(→hosts), relationships(→all)
Wave 0 (parallel): users, roles, hosts ← no dependencies
Wave 1 (parallel): policies, services ← depend only on wave 0
Wave 2 (sequential): relationships ← depends on everything
The wave grouping is computed by assigning each step a level:
level = max(level of dependencies) + 1, with roots at level 0.
INV-C05: No Cycles
Step dependencies must form a DAG. Circular dependencies are detected at connector load time and return an error:
#![allow(unused)] fn main() { // This will fail validation: step("a", "Step A").depends_on(&["b"]).build(), step("b", "Step B").depends_on(&["a"]).build(), // Error: "cycle detected in step dependencies: a -> b -> a" }
INV-C06: Fault Isolation
A failed step does not prevent sibling steps in the same wave from running. The scheduler logs the error and the wave completes with whatever steps succeeded:
Wave 0:
Step "hosts" completed: 100 entities
Step "users" FAILED: API timeout after 30s ← sibling steps still run
Step "roles" completed: 25 entities ← unaffected by "users" failure
Wave 1:
Step "policies" completed: 50 entities ← runs despite "users" failure
Steps that depend on a failed step are not skipped automatically — they run
but see an incomplete prior_entities set (the failed step contributed
nothing to it).
Prior Step Data
Downstream steps can read entities emitted by their dependencies via
ctx.prior:
#![allow(unused)] fn main() { async fn collect_services( &self, step_id: &str, ctx: &mut StepContext, ) -> Result<(), ConnectorError> { // Access entities from prior steps for host in ctx.prior.entities_by_type("host") { let services = self.api_client .get_services_for_host(&host.entity_key) .await?; // emit services... } Ok(()) } }
ctx.prior is a snapshot of all entities emitted by all successfully
completed prior steps (not just direct dependencies — all prior steps).
Step Naming Conventions
| Convention | Example |
|---|---|
| Lowercase kebab-case | "iam-users", "ec2-instances" |
| Noun or noun-phrase | "users", "role-assignments" |
| No spaces | "security-groups" not "security groups" |
Steps IDs are used in log output and SyncEvent messages, so choose
descriptive names.
Entity & Relationship Builders
The builder API is the primary way to construct entities and relationships in connector code. It uses a fluent interface designed to be readable and hard to misuse.
Entity Builder
entity(type, key) — Start a Builder
#![allow(unused)] fn main() { use parallax_connect::builder::entity; let builder = entity("host", "web-01"); }
type is the entity type (open set, snake_case, e.g., "host", "aws_ec2_instance").
key is the source-system's unique identifier for this entity.
Builder Methods
#![allow(unused)] fn main() { entity("host", "web-01") // Required: entity class (closed set of ~40 values) .class("Host") // Optional: human-readable name .display_name("Web Server 01") // Add a single property (value can be any Value type) .property("state", "running") // String .property("cpu_count", 4i64) // Integer .property("memory_gb", 32.0f64) // Float .property("active", true) // Boolean .property("terminated_at", Value::Null) // Null // Add multiple properties at once .properties([ ("region", Value::from("us-east-1")), ("az", Value::from("us-east-1a")), ]) }
.property() Value Types
The .property() method accepts anything that implements Into<Value>:
| Rust Type | PQL Value Type | Example |
|---|---|---|
&str, String | String | .property("state", "running") |
i64, i32, usize | Int | .property("port", 443i64) |
f64, f32 | Float | .property("score", 9.8f64) |
bool | Bool | .property("active", true) |
Value::Null | Null | .property("deleted_at", Value::Null) |
Emitting the Entity
#![allow(unused)] fn main() { ctx.emit_entity( entity("host", "web-01") .class("Host") .display_name("Web Server 01") .property("state", "running") )?; }
emit_entity() returns Result<(), ConnectorError>. The error is returned
if the entity builder is invalid (e.g., missing class).
Relationship Builder
relationship(from_type, from_key, verb, to_type, to_key) — Start a Builder
#![allow(unused)] fn main() { use parallax_connect::builder::relationship; let builder = relationship("host", "web-01", "RUNS", "service", "nginx"); }
Builder Methods
#![allow(unused)] fn main() { relationship("host", "web-01", "RUNS", "service", "nginx") // Add properties to the edge (optional) .property("since", "2024-01-15") .property("port", 8080i64) }
Emitting the Relationship
#![allow(unused)] fn main() { ctx.emit_relationship( relationship("host", "web-01", "RUNS", "service", "nginx") .property("port", 443i64) )?; }
Important: Both endpoints of the relationship must exist in the current batch or in the committed graph. Emitting a relationship to a non-existent entity is not an error at emit time, but it will be rejected at commit time (INV-C04).
Full Example
#![allow(unused)] fn main() { async fn collect_hosts_and_services(ctx: &mut StepContext) -> Result<(), ConnectorError> { // Emit host ctx.emit_entity( entity("host", "web-01") .class("Host") .display_name("Web Server 01") .property("state", "running") .property("region", "us-east-1") .property("cpu_count", 8i64) .property("memory_gb", 32.0f64) )?; // Emit service ctx.emit_entity( entity("service", "nginx-web-01") .class("Service") .display_name("Nginx on web-01") .property("port", 443i64) .property("protocol", "https") )?; // Emit relationship (host RUNS service) ctx.emit_relationship( relationship("host", "web-01", "RUNS", "service", "nginx-web-01") .property("since", "2024-01-15") )?; Ok(()) } }
StepContext Metrics
After emitting, you can inspect how many entities/relationships were emitted:
#![allow(unused)] fn main() { ctx.emit_entity(...)?; ctx.emit_entity(...)?; ctx.emit_relationship(...)?; // Metrics for this step println!("Entities emitted: {}", ctx.metrics.entities_emitted); println!("Relationships emitted: {}", ctx.metrics.relationships_emitted); }
Sync Protocol
The sync protocol handles the transition from collected data to committed graph state. It diffs the new batch against the existing data and commits the delta atomically.
The Diff Algorithm
For each connector sync, the engine computes a diff between:
- Emitted: entities and relationships from the current connector run
- Existing: entities and relationships already in the graph from the same connector
Emitted entities: {A, B, C}
Existing entities: {A, B, D} ← D was in the last sync but not emitted this time
Delta:
Upsert A (unchanged if properties match)
Upsert B (unchanged if properties match)
Upsert C (new)
Delete D (not seen in this sync)
Source Scope (INV-C02)
The diff is scoped to the connector's source. Connector B's sync never deletes entities created by connector A:
Graph state:
Host web-01 (source: aws-connector)
Host web-02 (source: aws-connector)
User alice (source: okta-connector)
Okta sync: emits [alice-v2]
Result:
Host web-01 (source: aws-connector) — UNCHANGED
Host web-02 (source: aws-connector) — UNCHANGED
User alice (source: okta-connector) — UPDATED
Atomic Commit (INV-C01)
The entire delta (creates + updates + deletes) is committed as a single
WriteBatch. Either all changes land or none do:
#![allow(unused)] fn main() { // SyncEngine::commit_sync internals: let mut batch = WriteBatch::new(); for entity in &entities { match existing.find(|e| e.id == entity.id) { None => batch.upsert_entity(entity.clone()), // create Some(ex) if ex.properties != entity.properties => batch.upsert_entity(entity.clone()), // update Some(_) => {} // unchanged } } for existing in &existing_entities { if !seen_ids.contains(&existing.id) { batch.delete_entity(existing.id); // delete } } // Atomic commit engine.write(batch)?; }
Referential Integrity (INV-C04)
Before committing, the ingest layer validates that every relationship's endpoints exist. The check considers:
- Entities in the current batch (being committed now)
- Entities already in the graph from any connector
#![allow(unused)] fn main() { // validate_sync_batch checks each relationship: let available: HashSet<EntityId> = batch_entities.iter().map(|e| e.id).collect::<_>() .union(&snapshot_entities.iter().map(|e| e.id).collect::<_>()) .copied() .collect(); for rel in &relationships { if !available.contains(&rel.from_id) { return Err(SyncError::DanglingRelationship { ... }); } if !available.contains(&rel.to_id) { return Err(SyncError::DanglingRelationship { ... }); } } }
A sync batch with a dangling relationship returns SyncError::DanglingRelationship
and no data is committed.
SyncResult
#![allow(unused)] fn main() { pub struct SyncResult { pub sync_id: String, pub stats: SyncStats, } pub struct SyncStats { pub entities_created: u64, pub entities_updated: u64, pub entities_unchanged: u64, pub entities_deleted: u64, pub relationships_created: u64, pub relationships_updated: u64, pub relationships_unchanged: u64, pub relationships_deleted: u64, } }
Two Commit Modes
commit_sync_exclusive — Exclusive Engine Access
Used when you hold &mut StorageEngine. Best for single-threaded usage
or CLI tools:
#![allow(unused)] fn main() { let result = commit_sync_exclusive( &mut engine, &output.connector_id, &output.sync_id, output.entities, output.relationships, )?; }
SyncEngine::commit_sync — Shared Engine Access
Used in server mode where multiple connectors share one engine via
Arc<Mutex<StorageEngine>>:
#![allow(unused)] fn main() { let sync_engine = SyncEngine::new(Arc::clone(&engine)); let result = sync_engine.commit_sync( &output.connector_id, &output.sync_id, output.entities, output.relationships, )?; }
SyncEngine::commit_sync holds the engine lock only during the brief write
step, not during diff computation. Multiple connectors can diff concurrently;
they only serialize at the write step.
Idempotency
Running the same sync twice with identical data is safe and efficient:
First run: entities_created = 5, entities_deleted = 0
Second run: entities_unchanged = 5, entities_created = 0, entities_deleted = 0
The diff detects that nothing changed and the WriteBatch is empty,
so no WAL write occurs.
Connector Observability
The connector SDK provides structured events and logging for monitoring sync executions.
SyncEvent Stream
Pass a tokio::sync::mpsc::Sender<SyncEvent> to run_connector to receive
real-time events during the sync:
#![allow(unused)] fn main() { use tokio::sync::mpsc; use parallax_connect::{run_connector, SyncEvent}; let (tx, mut rx) = mpsc::channel(100); // Spawn event consumer tokio::spawn(async move { while let Some(event) = rx.recv().await { match event { SyncEvent::Started { connector_id, sync_id } => { tracing::info!(%connector_id, %sync_id, "Sync started"); } SyncEvent::StepStarted { step_id } => { tracing::info!(%step_id, "Step started"); } SyncEvent::StepCompleted { step_id, entities, relationships, duration } => { tracing::info!( %step_id, entities_emitted = entities, relationships_emitted = relationships, duration_ms = duration.as_millis(), "Step completed" ); } SyncEvent::StepFailed { step_id, error } => { tracing::warn!(%step_id, %error, "Step failed"); } SyncEvent::Completed { connector_id, sync_id } => { tracing::info!(%connector_id, %sync_id, "Sync completed"); } } } }); let output = run_connector(&connector, "account", "sync-001", Some(&tx)).await?; }
SyncEvent Variants
#![allow(unused)] fn main() { pub enum SyncEvent { Started { connector_id: String, sync_id: String, }, StepStarted { step_id: String, }, StepCompleted { step_id: String, entities: usize, relationships: usize, duration: Duration, }, StepFailed { step_id: String, error: ConnectorError, }, Completed { connector_id: String, sync_id: String, }, } }
Structured Logging
The scheduler uses tracing for structured logs. Enable them with any
tracing subscriber:
#![allow(unused)] fn main() { tracing_subscriber::fmt::init(); }
Relevant log fields:
connector_id— identifies the connectorsync_id— identifies this specific runstep_id— identifies the current stepentities— count of entities emitted in a steprelationships— count of relationships emitted
Metrics Integration
After a sync, the SyncStats from commit_sync / commit_sync_exclusive
provides counters suitable for Prometheus export:
#![allow(unused)] fn main() { let stats = result.stats; // Export to your metrics system metrics::counter!("parallax_entities_created_total") .increment(stats.entities_created); metrics::counter!("parallax_entities_deleted_total") .increment(stats.entities_deleted); metrics::counter!("parallax_relationships_created_total") .increment(stats.relationships_created); }
Or use the built-in Prometheus endpoint when running parallax-server:
GET /metrics returns engine-wide counters.
Tracing Context Propagation
When running connectors inside an existing trace context, the connector steps inherit the current span:
#![allow(unused)] fn main() { let span = tracing::info_span!("sync_run", connector = "my-service"); let output = async move { run_connector(&connector, "account", "sync-001", None).await } .instrument(span) .await?; }
All log output from within run_connector will be nested under this span.
REST API Overview
parallax-server exposes a REST HTTP API over Axum. All responses are JSON.
Base URL
http://localhost:7700 (default)
Configure with --host and --port flags or the PARALLAX_HOST /
PARALLAX_PORT environment variables.
Versioning
All endpoints are prefixed with /v1/. The version is part of the URL path,
not a header.
Endpoint Summary
| Method | Path | Description |
|---|---|---|
GET | /v1/health | Health check (auth-exempt) |
GET | /v1/stats | Entity and relationship counts |
POST | /v1/query | Execute a PQL query |
GET | /v1/entities/:id | Fetch an entity by ID |
GET | /v1/relationships/:id | Fetch a relationship by ID |
POST | /v1/ingest/sync | Connector sync batch |
POST | /v1/ingest/write | Direct write batch |
GET | /v1/connectors | List registered connectors |
POST | /v1/connectors/:id/sync | Trigger a connector sync |
GET | /metrics | Prometheus metrics exposition |
Content Type
All request bodies must be Content-Type: application/json.
All responses have Content-Type: application/json (except /metrics).
Request IDs
Every request receives a X-Request-Id header in the response (INV-A05).
The server either generates a UUID v4 or propagates the value from the
incoming request's X-Request-Id header. Use this for log correlation.
curl -v http://localhost:7700/v1/health
# < X-Request-Id: 550e8400-e29b-41d4-a716-446655440000
Common Response Codes
| Status | Meaning |
|---|---|
200 OK | Success |
400 Bad Request | Invalid request body or parameters |
401 Unauthorized | Missing or invalid API key |
404 Not Found | Entity/relationship not found |
500 Internal Server Error | Storage or processing error |
Starting the Server
# No auth (development)
parallax serve --data-dir ./data
# With API key
PARALLAX_API_KEY=my-secret-key parallax serve --data-dir ./data
# Custom host and port
parallax serve --host 0.0.0.0 --port 8080 --data-dir /var/lib/parallax
Authentication
Parallax uses API key authentication. When no API key is configured, the server runs in open mode (all requests accepted).
Configuration
Set the PARALLAX_API_KEY environment variable before starting the server:
export PARALLAX_API_KEY="your-secret-api-key"
parallax serve --data-dir ./data
If PARALLAX_API_KEY is not set or is empty, the server starts in open mode
and accepts all requests without authentication.
Making Authenticated Requests
Include the API key as a Bearer token in the Authorization header:
curl -H "Authorization: Bearer your-secret-api-key" \
http://localhost:7700/v1/stats
Or with an explicit header:
curl -H "Authorization: Bearer your-secret-api-key" \
-X POST http://localhost:7700/v1/query \
-H "Content-Type: application/json" \
-d '{"pql": "FIND host"}'
Exemptions
The /v1/health endpoint is always exempt from authentication (INV-A06).
This allows load balancers and health check systems to verify server liveness
without credentials.
# Always works, even with auth enabled
curl http://localhost:7700/v1/health
The /metrics Prometheus endpoint is not exempt — protect it from
unauthenticated access in production.
Security Properties
Constant-time comparison (INV-A02): API key verification uses constant-time string comparison to prevent timing attacks. An attacker cannot determine the correct key by measuring response time differences:
#![allow(unused)] fn main() { fn ct_eq(a: &str, b: &str) -> bool { if a.len() != b.len() { return false; } a.bytes().zip(b.bytes()).fold(0u8, |acc, (x, y)| acc | (x ^ y)) == 0 } }
Keys are never logged: The API key is stored in memory as Arc<String>.
It is never written to log output, error messages, or response bodies.
401 Response
When authentication fails:
{
"error": "Unauthorized",
"message": "missing or invalid API key"
}
HTTP status: 401 Unauthorized
Best Practices
- Use a random, high-entropy key:
openssl rand -hex 32 - Rotate keys regularly in production
- Use TLS in production (TLS termination planned for v0.2)
- Restrict network access to the server — treat the API key as a secondary defense, not the primary
Open Mode vs. Protected Mode
| Mode | Config | Use Case |
|---|---|---|
| Open | PARALLAX_API_KEY not set | Local development, CLI usage, testing |
| Protected | PARALLAX_API_KEY=<key> | Production, shared deployments |
In open mode, all endpoints are accessible without credentials. This is suitable for local development and single-user CLI usage where network access is already restricted.
Query Endpoints
POST /v1/query
Execute a PQL query and return results.
Request Body
{
"pql": "FIND host WITH state = 'running'"
}
| Field | Type | Required | Description |
|---|---|---|---|
pql | string | Yes | The PQL query to execute |
Response: Entity Query (200 OK)
{
"count": 2,
"entities": [
{
"id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
"entity_type": "host",
"entity_class": "Host",
"display_name": "Web Server 01",
"properties": {
"state": "running",
"region": "us-east-1"
},
"source": {
"connector_id": "my-connector",
"sync_id": "sync-001"
}
},
{
"id": "b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7",
"entity_type": "host",
"entity_class": "Host",
"display_name": "Database Server 01",
"properties": {
"state": "running",
"region": "us-west-2"
}
}
]
}
Response: Count Query (200 OK)
For FIND ... RETURN COUNT:
{
"count": 42
}
Response: Traversal Query (200 OK)
For FIND A THAT VERB B:
{
"count": 1,
"entities": [
{
"id": "...",
"entity_type": "host",
"entity_class": "Host",
"display_name": "Web Server 01"
}
]
}
Response: Path Query (200 OK)
For FIND SHORTEST PATH FROM A TO B:
{
"count": 1,
"path": [
{"entity_id": "user-alice-id"},
{"entity_id": "role-admin-id", "via_relationship": "rel-assigned-id"},
{"entity_id": "secret-prod-id", "via_relationship": "rel-allows-id"}
]
}
If no path exists:
{
"count": 0,
"path": null
}
Error: Parse Error (400)
{
"error": "ParseError",
"message": "unexpected token '=' at position 18: expected comparison operator"
}
Error: Execution Error (500)
{
"error": "ExecutionError",
"message": "query timed out after 30000ms"
}
Examples
# Find all running hosts
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND host WITH state = '\''running'\''"}'
# Count all services
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND Service RETURN COUNT"}'
# Find hosts with no EDR
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND host THAT !PROTECTS edr_agent"}'
# Blast radius
curl -X POST http://localhost:7700/v1/query \
-H 'Content-Type: application/json' \
-d '{"pql": "FIND BLAST RADIUS FROM host WITH _key = '\''web-01'\'' DEPTH 4"}'
Ingest Endpoints
POST /v1/ingest/sync
Submit a connector sync batch. This is the primary ingest endpoint.
The server runs the sync protocol: validates referential integrity, diffs against the current graph state for this connector, and commits the delta atomically (INV-C01).
Request Body
{
"connector_id": "my-connector",
"sync_id": "sync-2024-01-15-001",
"entities": [
{
"entity_type": "host",
"entity_key": "web-01",
"entity_class": "Host",
"display_name": "Web Server 01",
"properties": {
"state": "running",
"region": "us-east-1",
"cpu_count": 8
}
},
{
"entity_type": "service",
"entity_key": "nginx-web-01",
"entity_class": "Service",
"display_name": "Nginx on web-01"
}
],
"relationships": [
{
"from_type": "host",
"from_key": "web-01",
"verb": "RUNS",
"to_type": "service",
"to_key": "nginx-web-01"
}
]
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
connector_id | string | Yes | Identifies the connector. Used for source-scoped diff. |
sync_id | string | Yes | Unique ID for this sync run. Included in source tracking. |
entities | array | Yes | Entities to upsert. Can be empty. |
relationships | array | Yes | Relationships to upsert. Can be empty. |
Entity Fields
| Field | Type | Required | Description |
|---|---|---|---|
entity_type | string | Yes | Type identifier (snake_case, e.g., "host") |
entity_key | string | Yes | Source-system unique key |
entity_class | string | Yes | Class from the known classes list |
display_name | string | No | Human-readable name |
properties | object | No | Flat key-value property bag |
Relationship Fields
| Field | Type | Required | Description |
|---|---|---|---|
from_type | string | Yes | Source entity type |
from_key | string | Yes | Source entity key |
verb | string | Yes | Relationship verb from the known verbs list |
to_type | string | Yes | Target entity type |
to_key | string | Yes | Target entity key |
properties | object | No | Flat key-value property bag on the edge |
Response (200 OK)
{
"sync_id": "sync-2024-01-15-001",
"entities_created": 2,
"entities_updated": 0,
"entities_unchanged": 0,
"entities_deleted": 0,
"relationships_created": 1,
"relationships_updated": 0,
"relationships_unchanged": 0,
"relationships_deleted": 0
}
Error: Dangling Relationship (500)
If a relationship references an entity that doesn't exist in the batch or the current graph:
{
"error": "DanglingRelationship",
"message": "relationship from host:web-01 RUNS service:ghost — to_id not found in batch or graph"
}
Example
curl -X POST http://localhost:7700/v1/ingest/sync \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer your-key' \
-d '{
"connector_id": "my-connector",
"sync_id": "sync-001",
"entities": [
{"entity_type": "host", "entity_key": "web-01", "entity_class": "Host",
"display_name": "Web Server 01", "properties": {"state": "running"}}
],
"relationships": []
}'
POST /v1/ingest/write
Direct write batch — bypass the sync protocol. Use this when you want to write entities without source-scoped diffing.
Unlike /v1/ingest/sync, this endpoint does not:
- Diff against previous state from the same connector
- Delete entities not present in the batch
It simply upserts all provided entities and relationships.
Request Body
{
"write_id": "write-001",
"entities": [
{
"entity_type": "host",
"entity_key": "h1",
"entity_class": "Host",
"display_name": "Server 1"
}
],
"relationships": []
}
Response (200 OK)
{
"write_id": "write-001",
"entities_written": 1,
"relationships_written": 0
}
When to Use Write vs. Sync
| Use Case | Endpoint |
|---|---|
| Connector that owns its entities | /v1/ingest/sync |
| One-time bulk import | /v1/ingest/write |
| Incremental updates (no deletions) | /v1/ingest/write |
| Full re-sync with deletion of departed entities | /v1/ingest/sync |
Entity & Relationship Endpoints
GET /v1/entities/:id
Fetch a single entity by its EntityId.
URL Parameters
| Parameter | Description |
|---|---|
:id | Hex-encoded EntityId (32 hex characters = 16 bytes) |
Computing an EntityId
Entity IDs are deterministic from (account_id, entity_type, entity_key).
You can compute them client-side using the same blake3 derivation:
#![allow(unused)] fn main() { use parallax_core::entity::EntityId; // account_id is "default" when using the REST API without multi-tenancy let id = EntityId::derive("default", "host", "web-01"); let hex = format!("{id}"); // 32 hex characters }
Or derive the hex string from the list response (/v1/query).
Response (200 OK)
{
"id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
"type": "host",
"class": "Host",
"display_name": "Web Server 01",
"properties": {
"state": "running",
"region": "us-east-1",
"cpu_count": 8
},
"source": {
"connector_id": "my-connector",
"sync_id": "sync-001"
},
"created_at": "2024-01-15T10:00:00Z",
"updated_at": "2024-01-15T10:05:00Z"
}
Response (404 Not Found)
{
"error": "NotFound",
"message": "entity a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6 not found"
}
Example
# Derive entity ID
HOST_ID=$(printf '%s' 'default:host:web-01' | sha256sum | cut -c1-32)
curl http://localhost:7700/v1/entities/a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
GET /v1/relationships/:id
Fetch a single relationship by its RelationshipId.
Response (200 OK)
{
"id": "d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9",
"class": "RUNS",
"from_id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
"to_id": "b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7",
"properties": {
"port": 443
},
"source": {
"connector_id": "my-connector",
"sync_id": "sync-001"
}
}
Response (404 Not Found)
{
"error": "NotFound",
"message": "relationship d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9 not found"
}
Admin Endpoints
GET /v1/health
Health check endpoint. Always returns 200 OK if the server is running. This endpoint is exempt from authentication (INV-A06).
Response (200 OK)
{
"status": "ok",
"version": "0.1.0"
}
Use this for load balancer health checks and readiness probes.
GET /v1/stats
Engine statistics including entity and relationship counts.
Response (200 OK)
{
"total_entities": 12543,
"total_relationships": 45210,
"version": "0.1.0",
"uptime_seconds": 3600,
"type_counts": {
"host": 1200,
"user": 5400,
"service": 2100,
"aws_s3_bucket": 450,
"aws_iam_role": 380
},
"class_counts": {
"Host": 1200,
"User": 5400,
"Service": 2100,
"DataStore": 450
}
}
Example
curl http://localhost:7700/v1/stats
GET /v1/connectors
List registered connectors.
In v0.1, this returns the list of connector IDs that have submitted at least one sync batch to this server instance.
Response (200 OK)
{
"connectors": [
{
"id": "aws-connector",
"last_sync_id": "sync-2024-01-15-003",
"entity_count": 8421
},
{
"id": "okta-connector",
"last_sync_id": "sync-2024-01-15-001",
"entity_count": 4122
}
]
}
POST /v1/connectors/:id/sync
Trigger a sync for a registered connector.
In v0.1, this is a stub endpoint that returns 501 Not Implemented. Full server-side connector scheduling is planned for v0.2.
Response (501 Not Implemented)
{
"error": "NotImplemented",
"message": "server-side connector scheduling is planned for v0.2"
}
For now, run connectors out-of-process and use POST /v1/ingest/sync to
submit the results.
GET /v1/policies
List the currently loaded policy rules.
Response (200 OK)
{
"count": 2,
"rules": [
{
"id": "edr-coverage-001",
"name": "EDR coverage gap",
"severity": "high",
"description": "Hosts without EDR protection.",
"query": "FIND host THAT !PROTECTS edr_agent",
"enabled": true,
"frameworks": [
{ "framework": "CIS-Controls-v8", "control": "10.1" }
]
}
]
}
POST /v1/policies
Replace the loaded rule set. PQL in each enabled rule is validated at load time (INV-P06) — invalid PQL returns 400.
Request body
{
"rules": [
{
"id": "edr-coverage-001",
"name": "EDR coverage gap",
"severity": "high",
"description": "Hosts without EDR protection.",
"query": "FIND host THAT !PROTECTS edr_agent",
"frameworks": [{ "framework": "CIS-Controls-v8", "control": "10.1" }],
"schedule": "manual",
"remediation": "Deploy EDR agent.",
"enabled": true
}
]
}
Response (200 OK)
{ "loaded": 1 }
Response (400 Bad Request) — invalid PQL
{ "error": "policy validation failed: Rule 'edr-coverage-001' contains invalid PQL: ..." }
POST /v1/policies/evaluate
Evaluate all enabled rules against the current graph snapshot.
Response (200 OK)
{
"total": 2,
"pass": 1,
"fail": 1,
"results": [
{
"rule_id": "edr-coverage-001",
"status": "Fail",
"violation_count": 3,
"error": null
},
{
"rule_id": "mfa-all-users",
"status": "Pass",
"violation_count": 0,
"error": null
}
]
}
Status values: "Pass", "Fail", "Error", "Skipped".
GET /v1/policies/posture
Compute compliance posture for a framework.
Query parameters
| Parameter | Default | Description |
|---|---|---|
framework | CIS-Controls-v8 | Framework name to report on |
Response (200 OK)
{
"framework": "CIS-Controls-v8",
"overall_score": 0.75,
"controls": [
{
"control_id": "10.1",
"status": "Fail",
"rule_count": 1,
"pass_count": 0,
"fail_count": 1
},
{
"control_id": "6.5",
"status": "Pass",
"rule_count": 1,
"pass_count": 1,
"fail_count": 0
}
]
}
overall_score is the fraction of controls with status Pass (0.0–1.0).
Controls with no mapped rules are not included.
Prometheus Metrics
GET /metrics
Returns engine metrics in Prometheus text exposition format. Use this endpoint to integrate with Prometheus, Grafana, or any compatible metrics system.
Response (200 OK)
# HELP parallax_entities_total Total number of entities in the graph
# TYPE parallax_entities_total gauge
parallax_entities_total 12543
# HELP parallax_relationships_total Total number of relationships in the graph
# TYPE parallax_relationships_total gauge
parallax_relationships_total 45210
# HELP parallax_writes_total Total number of write batches committed
# TYPE parallax_writes_total counter
parallax_writes_total 1024
# HELP parallax_reads_total Total number of snapshot reads
# TYPE parallax_reads_total counter
parallax_reads_total 98432
# HELP parallax_uptime_seconds Server uptime in seconds
# TYPE parallax_uptime_seconds gauge
parallax_uptime_seconds 3601
Content Type
Content-Type: text/plain; version=0.0.4; charset=utf-8
Prometheus Scrape Configuration
# prometheus.yml
scrape_configs:
- job_name: 'parallax'
static_configs:
- targets: ['localhost:7700']
metrics_path: '/metrics'
# Add auth if PARALLAX_API_KEY is set:
authorization:
credentials: 'your-api-key'
Available Metrics
| Metric | Type | Description |
|---|---|---|
parallax_entities_total | Gauge | Current entity count (includes soft-deleted) |
parallax_relationships_total | Gauge | Current relationship count |
parallax_writes_total | Counter | Total committed write batches |
parallax_reads_total | Counter | Total snapshot reads |
parallax_uptime_seconds | Gauge | Server uptime in seconds |
Future Metrics (v0.2)
Planned additions:
parallax_query_duration_seconds— histogram of query latenciesparallax_wal_bytes_total— WAL bytes writtenparallax_segment_count— number of on-disk segmentsparallax_sync_duration_seconds— histogram of sync durations per connectorparallax_sync_errors_total— count of sync errors by connector
Example
curl http://localhost:7700/metrics
Or with authentication:
curl -H 'Authorization: Bearer your-key' http://localhost:7700/metrics
Error Responses
All API errors follow a consistent JSON structure.
Error Response Format
{
"error": "ErrorType",
"message": "Human-readable description of what went wrong"
}
The X-Request-Id header is always present in error responses, making it
easy to correlate errors with server logs.
Error Types
400 Bad Request
| Error | Cause |
|---|---|
ParseError | PQL query has a syntax error |
InvalidRequest | Request body is missing required fields or has invalid types |
InvalidEntityClass | Entity class is not in the known classes list |
InvalidRelationshipVerb | Relationship verb is not in the known verbs list |
{
"error": "ParseError",
"message": "unexpected token 'WHERE' at position 12: use 'WITH' instead"
}
401 Unauthorized
{
"error": "Unauthorized",
"message": "missing or invalid API key"
}
404 Not Found
{
"error": "NotFound",
"message": "entity a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6 not found"
}
500 Internal Server Error
| Error | Cause |
|---|---|
StoreError | Storage engine error (I/O failure, corruption) |
DanglingRelationship | Ingest batch contains a relationship referencing a non-existent entity |
ExecutionError | Query execution failed (timeout, resource limit) |
SyncError | Sync commit failed |
{
"error": "DanglingRelationship",
"message": "relationship from host:web-01 RUNS service:ghost — entity not found in batch or graph"
}
{
"error": "StoreError",
"message": "I/O error writing to WAL: No space left on device"
}
Common Mistakes
"WHERE" instead of "WITH"
PQL uses WITH for property filters, not WHERE:
# Wrong
FIND host WHERE state = 'running'
# Correct
FIND host WITH state = 'running'
Double quotes in string literals
PQL uses single quotes, not double quotes:
# Wrong
FIND host WITH state = "running"
# Correct
FIND host WITH state = 'running'
Dangling relationship in ingest
The from_key and to_key in relationships must reference entities that
exist in the same batch or the current graph:
{
"entities": [
{"entity_type": "host", "entity_key": "web-01", "entity_class": "Host"}
],
"relationships": [
{
"from_type": "host", "from_key": "web-01",
"verb": "RUNS",
"to_type": "service", "to_key": "ghost-service"
}
]
}
This returns a DanglingRelationship error because service:ghost-service
doesn't exist. Either include the service entity in the batch or remove the
relationship.
Logging
Server-side errors are logged with the request ID and full error context:
ERROR parallax_server::routes: query execution failed
request_id = "550e8400-e29b-41d4-a716-446655440000"
error = "ParseError: unexpected token '=' at position 18"
pql = "FIND host WITH state = = 'running'"
Use the X-Request-Id from the response to search server logs.
Policy Engine Overview
parallax-policy evaluates security policy rules against the live graph.
Policies are PQL-powered rules that identify compliance violations, security
gaps, and posture issues.
What It Does
- Load rules: Accept
PolicyRuledefinitions with PQL queries - Validate queries: Reject rules whose PQL is invalid at load time (INV-P01)
- Evaluate: Run all rules against the current graph snapshot
- Posture scoring: Compute per-control status and an overall security posture score
What It Is Not
- Not a mutation engine. Policy evaluation is read-only (INV-P02).
- Not a real-time alerting system. It evaluates on-demand or on schedule.
- Not a SIEM. It doesn't process event streams.
Core Types
#![allow(unused)] fn main() { pub struct PolicyRule { /// Unique rule identifier pub id: String, /// Human-readable description pub title: String, /// PQL query that finds violating entities /// (empty result = compliant; any result = violation) pub query: String, /// Severity of the violation pub severity: Severity, /// Framework controls this rule maps to pub framework_mapping: Vec<FrameworkMapping>, /// Evaluation schedule (future: cron expression) pub schedule: Option<String>, } pub enum Severity { Critical, High, Medium, Low, Info, } pub struct FrameworkMapping { pub framework: String, // "CIS", "NIST", "PCI-DSS", "SOC2" pub control_id: String, // "CIS-1.1", "NIST-AC-2", etc. } }
Quick Example
#![allow(unused)] fn main() { use parallax_policy::{PolicyEvaluator, PolicyRule, Severity}; let rules = vec![ PolicyRule { id: "no-unprotected-hosts".to_string(), title: "All hosts must have EDR protection".to_string(), query: "FIND host THAT !PROTECTS edr_agent".to_string(), severity: Severity::High, framework_mapping: vec![ FrameworkMapping { framework: "CIS".to_string(), control_id: "CIS-7.1".to_string(), } ], schedule: None, }, ]; let evaluator = PolicyEvaluator::new(rules)?; let snap = engine.snapshot(); let results = evaluator.evaluate_all(&snap)?; for result in &results { println!("{}: {} violations ({})", result.rule_id, result.violations.len(), result.severity); } let posture = evaluator.compute_posture(&snap)?; println!("Overall posture score: {:.1}%", posture.overall_score * 100.0); }
See Policy Rules, Evaluation, and Posture Scoring for complete documentation.
Policy Rules
A policy rule is a named PQL query that finds security violations. An empty result means compliant; any results mean violation.
PolicyRule Structure
#![allow(unused)] fn main() { pub struct PolicyRule { /// Unique identifier (e.g. `edr-coverage-001`) pub id: String, /// Human-readable name pub name: String, pub severity: Severity, /// Long-form description pub description: String, /// The PQL query that finds violations (0 results = PASS, >0 = FAIL) pub query: String, /// Compliance framework mappings pub frameworks: Vec<FrameworkMapping>, /// Evaluation schedule pub schedule: Schedule, /// Remediation guidance (markdown) pub remediation: String, pub enabled: bool, } }
Severity Levels
#![allow(unused)] fn main() { pub enum Severity { Critical, // Immediate action required High, // Resolve within 24 hours Medium, // Resolve within 7 days Low, // Track and resolve at next sprint Info, // Informational, no SLA } }
Severities are serialised as lowercase strings in YAML/JSON:
"critical", "high", "medium", "low", "info".
Schedule
#![allow(unused)] fn main() { pub enum Schedule { Manual, // Only on explicit request Every(Duration), // Periodic: "every:5m", "every:2h", "every:1d" OnSync(Vec<String>), // After named connectors sync: "on_sync:aws,gcp" } }
Framework Mapping
#![allow(unused)] fn main() { pub struct FrameworkMapping { pub framework: String, // e.g. "CIS-Controls-v8", "NIST-CSF", "SOC2" pub control: String, // e.g. "10.1", "DE.CM-4" } }
YAML Rule Files
Rules are defined in YAML and loaded with load_rules_from_yaml(path):
rules:
- id: edr-coverage-001
name: EDR coverage gap
severity: high
description: Hosts without EDR protection.
query: "FIND host THAT !PROTECTS edr_agent"
frameworks:
- framework: CIS-Controls-v8
control: "10.1"
schedule: "manual" # or "every:5m", "on_sync:aws,gcp"
remediation: "Deploy EDR agent to all hosts."
enabled: true
- id: mfa-enforcement-001
name: MFA not enforced
severity: critical
description: Active users without MFA.
query: "FIND user WITH mfa_enabled = false AND active = true"
frameworks:
- framework: NIST-CSF
control: "PR.AC-7"
- framework: CIS-Controls-v8
control: "6.5"
schedule: "every:1h"
remediation: "Enable MFA for all active user accounts."
enabled: true
#![allow(unused)] fn main() { use parallax_policy::load_rules_from_yaml; let rules = load_rules_from_yaml(Path::new("rules/security.yaml"))?; }
INV-P06: Validation at Load Time
Policy rules with invalid PQL are rejected when loaded into the evaluator, not at evaluation time:
#![allow(unused)] fn main() { let result = PolicyEvaluator::load(vec![ PolicyRule::new("bad", "Invalid PQL", "FIND host WHERE state = 'running'"), ], &stats); // Err(PolicyError::InvalidQuery { rule_id: "bad", parse_error: "..." }) }
This prevents silent failures where a broken rule never catches violations.
Query Design Patterns
| Pattern | Query |
|---|---|
| Find X that lack Y | FIND X THAT !VERB Y |
| Find X with bad property | FIND X WITH property = 'bad_value' |
| Find X in bad state OR another bad state | FIND X WITH state = 'a' OR state = 'b' |
| Find X connected to dangerous Y | FIND X THAT VERB Y WITH dangerous = true |
| Count of X (threshold check) | FIND X RETURN COUNT |
| Multi-hop risk path | FIND X THAT V1 Y THAT V2 Z WITH prop = val |
Example Rules
EDR Coverage
- id: edr-coverage-001
name: EDR coverage gap
severity: high
query: "FIND host THAT !PROTECTS edr_agent"
frameworks:
- framework: CIS-Controls-v8
control: "10.1"
schedule: "on_sync:aws"
remediation: "Deploy EDR agent."
enabled: true
MFA Enforcement
- id: mfa-all-users
name: MFA not enforced
severity: critical
query: "FIND user WITH mfa_enabled = false AND active = true"
frameworks:
- framework: NIST-CSF
control: "PR.AC-7"
schedule: "every:1h"
remediation: "Enable MFA for all active users."
enabled: true
Public Cloud Storage
- id: no-public-buckets
name: Public S3 buckets
severity: critical
query: "FIND aws_s3_bucket WITH public = true"
frameworks:
- framework: CIS-Controls-v8
control: "3.3"
schedule: "on_sync:aws"
remediation: "Disable public access on all S3 buckets."
enabled: true
Policy Evaluation
Loading an Evaluator
#![allow(unused)] fn main() { use parallax_policy::{PolicyEvaluator, load_rules_from_yaml}; use parallax_query::IndexStats; let rules = load_rules_from_yaml(Path::new("rules/security.yaml"))?; let stats = IndexStats::new(type_counts, class_counts, total, rel_total); let evaluator = PolicyEvaluator::load(rules, &stats)?; // Err if any rule contains invalid PQL (INV-P06) }
Running an Evaluation
Sequential
#![allow(unused)] fn main() { let snap = engine.snapshot(); let graph = GraphReader::new(&snap); let results = evaluator.evaluate_all(&graph, QueryLimits::default()); }
Parallel (3E)
#![allow(unused)] fn main() { // Each rule runs on its own OS thread; results collected in definition order. let results = evaluator.par_evaluate_all(&graph, QueryLimits::default()); }
Both methods return identical results in the same order. par_evaluate_all
is preferred when evaluating many rules — it uses std::thread::scope so
non-'static lifetimes (including GraphReader<'snap>) work correctly.
RuleResult
#![allow(unused)] fn main() { pub struct RuleResult { pub rule_id: String, pub status: RuleStatus, pub violations: Vec<Violation>, pub error: Option<String>, // set on RuleStatus::Error pub evaluated_at: Timestamp, pub duration: Duration, } pub enum RuleStatus { Pass, // query returned 0 results Fail, // query returned ≥1 results Error, // rule evaluation errored (INV-P03: others still run) Skipped, // rule.enabled = false } pub struct Violation { pub entity_id: EntityId, pub entity_type: EntityType, pub display_name: CompactString, pub details: String, } }
INV-P01: Snapshot Atomicity
All rules in one evaluate_all / par_evaluate_all call read the same
snapshot. Rules see a consistent point-in-time view of the graph even if
new data is being ingested concurrently.
INV-P02: Read-Only
Policy evaluation never modifies the graph.
INV-P03: Error Isolation
A rule that errors during evaluation is recorded with RuleStatus::Error and
does not abort evaluation of other rules:
#![allow(unused)] fn main() { for result in &results { match result.status { RuleStatus::Pass => println!("✓ {} — PASS", result.rule_id), RuleStatus::Fail => println!("✗ {} — {} violations", result.rule_id, result.violations.len()), RuleStatus::Error => println!("! {} — {}", result.rule_id, result.error.as_deref().unwrap_or("?")), RuleStatus::Skipped => println!("- {} — skipped", result.rule_id), } } }
Performance
Each rule runs one PQL query. For N rules, evaluate_all makes N sequential
graph reads; par_evaluate_all runs all N concurrently on separate threads.
Typical throughput (100 rules, 100K entities):
- Sequential: ~1–5 seconds (depends on rule complexity and graph structure)
- Parallel: ~100–500ms (bounded by the slowest rule)
Posture Scoring
Posture scoring aggregates rule evaluation results into a per-framework compliance score and an overall posture score.
Computing Posture
#![allow(unused)] fn main() { let posture = evaluator.compute_posture(&snap)?; println!("Overall: {:.1}%", posture.overall_score * 100.0); for (framework, score) in &posture.framework_scores { println!("{}: {:.1}%", framework, score.score * 100.0); } }
FrameworkPosture
#![allow(unused)] fn main() { pub struct FrameworkPosture { /// Overall score: 0.0 (no rules passing) to 1.0 (all rules passing) pub overall_score: f64, /// Per-framework scores pub framework_scores: HashMap<String, ControlStatus>, /// Breakdown by severity pub by_severity: HashMap<Severity, SeverityBreakdown>, /// All rule results (pass, fail, error) pub results: Vec<RuleEvaluationResult>, } pub struct ControlStatus { pub framework: String, pub score: f64, // 0.0 to 1.0 pub passing_controls: u32, pub failing_controls: u32, pub error_controls: u32, } pub struct SeverityBreakdown { pub passing: u32, pub failing: u32, pub errored: u32, } }
Scoring Algorithm
Overall score:
score = passing_rules / total_rules
Where:
passing_rules= rules withEvaluationStatus::Passtotal_rules= all rules (pass + fail + error)
INV-P04: Errored rules count as failures for posture scoring. A rule that can't evaluate is treated as if it found violations.
Per-framework score:
For each framework (CIS, NIST, PCI-DSS, etc.), compute the ratio of passing controls to total controls mapped to that framework:
framework_score = controls_passing / controls_mapped_to_framework
If a rule maps to multiple frameworks, it contributes to each framework's score independently.
Example Report
Security Posture Report
=======================
Overall Score: 73.3% (11/15 rules passing)
By Severity:
Critical: 2 passing, 2 failing (50.0%)
High: 4 passing, 1 failing (80.0%)
Medium: 5 passing, 0 failing (100.0%)
Low: 0 passing, 1 failing (0.0%)
Info: 0 passing, 0 failing (N/A)
By Framework:
CIS: 8/10 controls passing (80.0%)
NIST: 6/8 controls passing (75.0%)
PCI-DSS: 4/7 controls passing (57.1%)
Failing Rules:
✗ [CRITICAL] All users must have MFA enabled
→ 47 users without MFA
✗ [CRITICAL] No S3 buckets should be publicly accessible
→ 3 public buckets: logs-bucket, public-assets, backup-2023
✗ [HIGH] All hosts must have EDR protection
→ 12 unprotected hosts
✗ [LOW] Minimize admin role assignments
→ 23 admin assignments (threshold: 10)
Framework Context
| Framework | Focus Area | Common Controls |
|---|---|---|
| CIS | Technical security benchmarks | EDR coverage, patch levels, network segmentation |
| NIST | Broad cybersecurity framework | Access control, identity, protect/detect/respond |
| PCI-DSS | Payment card data security | Network isolation, encryption, access logging |
| SOC 2 | Service organization controls | Availability, confidentiality, integrity |
| HIPAA | Healthcare data protection | PHI access, audit trails, encryption |
The framework mapping in PolicyRule is metadata — it doesn't change how
the rule is evaluated, only how it's categorized in the posture report.
Known Entity Classes
Entity classes are a closed set defined by Parallax. An entity submitted with an unknown class is rejected at ingest time.
The class is the broad category that enables cross-type queries:
FIND Host matches EC2 instances, Azure VMs, containers, and any other
entity whose class is Host.
Full Class List (41 classes)
| Class | Description | Example Types |
|---|---|---|
Host | Compute hosts — servers, VMs, containers | aws_ec2_instance, azure_vm, host |
User | Human or service user accounts | okta_user, aws_iam_user, user |
DataStore | Storage systems | aws_s3_bucket, database, datastore |
CodeRepo | Source code repositories | github_repo, gitlab_project |
Firewall | Network access control | aws_security_group, firewall |
AccessPolicy | Authorization policies | aws_iam_policy, access_policy |
NetworkSegment | Network segments/subnets | aws_vpc, aws_subnet, network |
Service | Running services or processes | service, microservice |
Certificate | TLS/SSL certificates | certificate, tls_cert |
Secret | Secrets and tokens | secret, aws_secret, vault_secret |
Credential | Credentials and API keys | credential, api_key |
Key | Encryption keys | aws_kms_key, key |
Container | Container instances | docker_container, container |
Pod | Kubernetes pods | k8s_pod, pod |
Cluster | Kubernetes clusters | k8s_cluster, eks_cluster |
Namespace | Kubernetes namespaces | k8s_namespace, namespace |
Function | Serverless functions | aws_lambda, function |
Queue | Message queues | aws_sqs_queue, queue |
Topic | Message topics | aws_sns_topic, topic |
Database | Database instances | aws_rds_instance, postgres_db |
Application | Applications or services | application, web_app |
Package | Software packages | npm_package, python_package |
Vulnerability | Security vulnerabilities | cve, vulnerability |
Identity | Identity providers/identities | identity, saml_identity |
Process | Running processes | process, daemon |
File | Files and filesystems | file, s3_object |
Registry | Container/package registries | ecr_repo, docker_registry |
Policy | Generic policies | policy, network_policy |
Account | Cloud or service accounts | aws_account, gcp_project |
Organization | Organizations or tenants | organization, company |
Team | Teams or groups | team, department |
Role | Roles or job functions | aws_iam_role, okta_group |
Group | Groups of entities | group, ad_group |
Device | Physical or virtual devices | device, workstation |
Endpoint | Network endpoints | endpoint, api_endpoint |
Scanner | Security scanners | scanner, qualys_scanner |
Agent | Security agents | agent, edr_agent |
Sensor | Telemetry sensors | sensor, network_tap |
Ticket | Tickets and issues | jira_issue, ticket |
Event | Security events | event, alert |
Generic | Catch-all for unlisted types | generic |
Using Classes in PQL
-- All hosts (regardless of type: EC2, Azure VM, container, etc.)
FIND Host
-- All datastores not accessible publicly
FIND DataStore WITH public = false
-- All users without MFA
FIND User WITH mfa_enabled = false
-- Hosts with no EDR agent protecting them
FIND Host THAT !PROTECTS Agent
Requesting a New Class
The class list is curated and intentionally kept small (~40 values). Before requesting a new class, consider:
- Can it be modeled with an existing class? (e.g., use
Genericfor truly novel entity types) - Is it used by multiple connectors, or just one? (connector-specific types should use an entity type, not a class)
- Would it enable useful cross-type queries that aren't possible today?
Open an issue on GitHub to propose new classes. New classes require a spec update and a minor version bump.
In Code
#![allow(unused)] fn main() { use parallax_core::entity::KNOWN_CLASSES; // Validate a class string if KNOWN_CLASSES.contains(&"Host") { let class = EntityClass::new("Host").unwrap(); } // Get a &[&str] of all known classes println!("Known classes: {:?}", KNOWN_CLASSES); }
Known Relationship Verbs
Relationship verbs are a closed set of 15 values. A relationship with an unknown verb is rejected at ingest time (in v0.1: warning; in v0.2: hard error).
The verb is the semantic label on a directed edge. A closed, curated set ensures
that queries like FIND * THAT ALLOWS internet have consistent semantics
across all connectors.
Full Verb List
| Verb | Direction | Semantic Meaning | Example |
|---|---|---|---|
HAS | A → B | Ownership or containment | aws_account HAS aws_s3_bucket |
IS | A ↔ B | Identity or equivalence | okta_user IS person |
ASSIGNED | A → B | Role or permission assignment | user ASSIGNED role |
ALLOWS | A → B | Grants network or access permission | security_group ALLOWS internet |
USES | A → B | Active dependency | service USES database |
CONTAINS | A → B | Logical grouping (strong containment) | aws_vpc CONTAINS aws_subnet |
MANAGES | A → B | Administrative control | team MANAGES github_repo |
CONNECTS | A ↔ B | Network-level connectivity | aws_vpc CONNECTS aws_vpc |
PROTECTS | A → B | Security control coverage | edr_agent PROTECTS host |
EXPLOITS | A → B | Vulnerability exploitation | cve EXPLOITS software_package |
TRUSTS | A → B | Trust relationship | aws_account TRUSTS aws_account |
SCANS | A → B | Scanner coverage | qualys_scanner SCANS host |
RUNS | A → B | Process or service execution | host RUNS service |
READS | A → B | Data access (read) | application READS database |
WRITES | A → B | Data access (write) | application WRITES database |
Using Verbs in PQL
-- Direct verb queries
FIND host THAT RUNS service
FIND user THAT ASSIGNED role
FIND security_group THAT ALLOWS internet
-- Negated (coverage gap)
FIND host THAT !PROTECTS edr_agent
FIND service THAT !SCANS scanner
-- Multi-hop
FIND user THAT ASSIGNED role THAT ALLOWS aws_s3_bucket
FIND cve THAT EXPLOITS package THAT USES service THAT RUNS host
Verb Semantics in Blast Radius
For blast radius analysis, these verbs are considered attack-relevant by default:
RUNS, CONNECTS, TRUSTS, CONTAINS, HAS, USES, EXPLOITS
These cover the most common lateral movement patterns:
RUNS: compromise a host → compromise its servicesCONNECTS: network path between hostsTRUSTS: cross-account / cross-system trustCONTAINS: moving from outer to inner containersHAS: ownership chain traversalUSES: dependency exploitationEXPLOITS: CVE to affected system
Verb Selection Guide
| Situation | Recommended Verb |
|---|---|
| Cloud resource ownership | HAS |
| IAM/RBAC assignment | ASSIGNED |
| Network access rules | ALLOWS |
| Service-to-database | USES or READS/WRITES |
| Host-to-service | RUNS |
| VPC peering | CONNECTS |
| Scanner-to-target | SCANS |
| EDR-to-host | PROTECTS |
| CVE-to-package | EXPLOITS |
| Organizational grouping | CONTAINS |
| Logical equivalence | IS |
In Code
#![allow(unused)] fn main() { use parallax_core::relationship::KNOWN_VERBS; // Validate a verb string if KNOWN_VERBS.contains(&"RUNS") { let verb = RelationshipClass::new("RUNS").unwrap(); } // Get all known verbs println!("Known verbs: {:?}", KNOWN_VERBS); }
Proposing a New Verb
New verbs require a spec change and community discussion. The bar is high: a new verb must:
- Be semantically distinct from all existing verbs
- Be used by at least 3 different connector types
- Enable new query patterns not possible with existing verbs
Open an issue on GitHub to propose new verbs.
Value Types
Parallax properties use a flat, fixed set of value types.
The Value Enum
#![allow(unused)] fn main() { pub enum Value { Null, Bool(bool), Int(i64), Float(f64), String(CompactString), Timestamp(Timestamp), StringArray(Vec<CompactString>), } }
Type Details
Null
Represents the absence of a value. Use Null for optional properties
that are not set.
#![allow(unused)] fn main() { // Rust entity.properties.insert("terminated_at".into(), Value::Null); // REST API JSON "properties": { "terminated_at": null } // PQL filter FIND host WITH terminated_at = null }
Bool
#![allow(unused)] fn main() { // Rust Value::from(true) Value::Bool(false) // REST API JSON "properties": { "active": true, "mfa_enabled": false } // PQL filter FIND user WITH active = true FIND user WITH mfa_enabled = false }
Int (i64)
Integer values in the range [-2^63, 2^63 - 1].
#![allow(unused)] fn main() { // Rust Value::from(443i64) Value::Int(8080) // REST API JSON — must be a JSON integer "properties": { "port": 443, "cpu_count": 8 } // PQL filter FIND host WITH cpu_count > 4 FIND service WITH port = 443 }
Float (f64)
Double-precision floating point.
#![allow(unused)] fn main() { // Rust Value::from(9.8f64) Value::Float(0.5) // REST API JSON "properties": { "score": 9.8, "utilization": 0.75 } // PQL filter FIND host WITH cpu_utilization > 0.8 }
String
Short-to-medium strings, backed by CompactString (stack-allocated for
strings ≤24 bytes; heap-allocated for longer strings).
#![allow(unused)] fn main() { // Rust Value::from("running") Value::String(CompactString::new("us-east-1")) // REST API JSON "properties": { "state": "running", "region": "us-east-1" } // PQL filter (single quotes only) FIND host WITH state = 'running' FIND user WITH email LIKE '%@corp.com' }
Timestamp
Hybrid Logical Clock timestamp. Primarily used for audit fields like
created_at, updated_at, sync_timestamp.
#![allow(unused)] fn main() { // Rust Value::Timestamp(Timestamp::now()) // REST API JSON — ISO 8601 string "properties": { "last_seen": "2024-01-15T10:30:00Z" } }
Not directly filterable in PQL v0.1. Use string properties for time-based filtering in the current version.
StringArray
A flat array of strings. Common for tags, labels, and group memberships. No nesting within arrays (no arrays of objects).
#![allow(unused)] fn main() { // Rust Value::StringArray(vec!["web".into(), "production".into(), "us-east-1".into()]) // REST API JSON "properties": { "tags": ["web", "production", "us-east-1"] } }
StringArray is not filterable in PQL v0.1. Filter via scalar properties.
Type Stability (INV-07/08)
INV-07: Property types must be stable within an entity type. If port
is an Int for aws_security_group_rule, it must always be Int for
that type. A connector that sends it as a String gets a warning in v0.1
and a hard error in v0.2.
INV-08: Properties are flat — no nested objects or arrays-of-objects.
JSON Type Mapping
| JSON Type | Parallax Value |
|---|---|
null | Value::Null |
true / false | Value::Bool |
| Integer (no decimal) | Value::Int |
| Number (with decimal) | Value::Float |
| String | Value::String |
| Array of strings | Value::StringArray |
| Object | Rejected — no nested objects |
| Array of objects | Rejected — no nested arrays |
Configuration
Server Configuration
Configuration via environment variables (recommended) or CLI flags.
| Env Var | CLI Flag | Default | Description |
|---|---|---|---|
PARALLAX_HOST | --host | 127.0.0.1 | Bind address |
PARALLAX_PORT | --port | 7700 | HTTP port |
PARALLAX_DATA_DIR | --data-dir | ./parallax-data | Data directory |
PARALLAX_API_KEY | — | (empty) | API key (empty = open mode) |
RUST_LOG | — | info | Log level (trace/debug/info/warn/error) |
Storage Engine Configuration
StoreConfig controls engine behavior. Set via code when embedding,
or via future config file support (v0.2).
#![allow(unused)] fn main() { pub struct StoreConfig { /// Root directory for WAL, segments, and index files. pub data_dir: PathBuf, /// Flush MemTable to segment when in-memory size exceeds this threshold. /// Default: 64MB. Larger = fewer flushes, more memory usage. pub memtable_flush_size: usize, /// Maximum WAL segment file size before rotation. /// Default: 64MB. After rotation, a new wal-{n}.pxw file starts. pub wal_segment_max_size: u64, } }
Data Directory Layout
After starting the server, the data directory has this structure:
parallax-data/
├── wal/
│ ├── wal-00000000.pxw ← WAL segments (binary, PXWA format)
│ ├── wal-00000001.pxw
│ └── ...
└── segments/
├── seg-00000000.pxs ← Immutable segment files (binary, PXSG format)
├── seg-00000001.pxs
└── ...
Both WAL and segment files are binary and not human-readable. Use
parallax stats to inspect the current state.
Query Limits
Default limits applied to all queries:
| Limit | Default | Description |
|---|---|---|
max_results | 10,000 | Maximum entities returned per query |
timeout | 30s | Query execution timeout |
max_traversal_depth | 10 | Maximum traversal hops |
These are currently hardcoded. Per-query overrides and global configuration are planned for v0.2.
Logging
Parallax uses tracing for structured logging. Configure with RUST_LOG:
# Common patterns
RUST_LOG=info parallax serve # Default
RUST_LOG=debug parallax serve # Verbose
RUST_LOG=parallax_store=trace parallax serve # Trace storage only
RUST_LOG=parallax=debug,tower_http=warn parallax serve # App debug, HTTP quiet
Log format (JSON) is planned for v0.2. Current format is human-readable text.
Production Recommendations
# Full production configuration
export PARALLAX_HOST=0.0.0.0
export PARALLAX_PORT=7700
export PARALLAX_DATA_DIR=/var/lib/parallax
export PARALLAX_API_KEY=$(openssl rand -hex 32)
export RUST_LOG=info
parallax serve
Storage:
- Mount
PARALLAX_DATA_DIRon a fast SSD - Ensure at least 10GB free space for WAL and segments
- Set up log rotation for WAL files (automatic in v0.2)
Security:
- Generate a cryptographically random API key
- Terminate TLS at the load balancer or reverse proxy
- Restrict network access to the parallax port
- Never log the
PARALLAX_API_KEY
Monitoring:
- Scrape
/metricswith Prometheus - Alert on
parallax_errors_totalincreases - Monitor
parallax_entities_totalfor unexpected spikes or drops
CLI Reference
The parallax CLI provides commands for serving, querying, and inspecting
the graph engine.
Installation
# From source
cargo install --path crates/parallax-cli
# From release binary (when available)
curl -LO https://releases.parallax.rs/v0.1.0/parallax-linux-amd64
chmod +x parallax-linux-amd64
mv parallax-linux-amd64 /usr/local/bin/parallax
Global Flags
parallax [--log-format FORMAT] <COMMAND>
Global options:
--log-format <FORMAT> Log format: "text" (default) or "json"
--log-format json emits structured JSON logs compatible with log aggregators
(Datadog, Splunk, CloudWatch Logs).
Commands
parallax serve
Start the REST HTTP server.
parallax serve [OPTIONS]
Options:
--host <HOST> Bind address [default: 127.0.0.1]
--port <PORT> Port number [default: 7700]
--data-dir <PATH> Data directory [default: ./parallax-data]
Environment:
PARALLAX_API_KEY API key for authentication (empty = open mode)
Examples:
# Development (open mode)
parallax serve --data-dir ./data
# Production (with auth, all interfaces)
PARALLAX_API_KEY=$(openssl rand -hex 32) \
parallax serve --host 0.0.0.0 --port 7700 --data-dir /var/lib/parallax
parallax query
Execute a PQL query against a local data directory (in-process, no server).
parallax query <PQL> [OPTIONS]
Arguments:
<PQL> The PQL query string (quote it)
Options:
--data-dir <PATH> Data directory [default: ./parallax-data]
--limit <N> Limit results [default: 100]
Examples:
# Find all running hosts
parallax query "FIND host WITH state = 'running'"
# Count all hosts
parallax query "FIND host RETURN COUNT"
# Group by OS
parallax query "FIND host GROUP BY os"
# With custom data dir
parallax query "FIND host" --data-dir /var/lib/parallax
Example output:
Results: 2
[host] Web Server 1 (id: a1b2c3d4...)
[host] Web Server 2 (id: e5f6a7b8...)
parallax stats
Display entity and relationship counts.
parallax stats [OPTIONS]
Options:
--data-dir <PATH> Data directory [default: ./parallax-data]
--json Output as JSON
Example output:
Parallax Graph Statistics
=========================
Total entities: 12,543
Total relationships: 45,210
By type:
host 1,200
user 5,400
service 2,100
aws_s3_bucket 450
aws_iam_role 380
(36 more types...)
By class:
User 5,400
Host 1,200
Service 2,100
DataStore 450
parallax wal dump
Inspect the Write-Ahead Log for debugging and forensics.
parallax wal dump [OPTIONS]
Options:
--data-dir <PATH> Data directory [default: ./parallax-data]
--verbose Show individual operation details (default: summary only)
Example output (summary):
WAL dump — data_dir: ./parallax-data
seq ops segment
--------------------------------------------------
1 50 wal-00000001.pxw
2 120 wal-00000002.pxw
3 30 wal-00000003.pxw
Total: 3 batches, 200 ops
With --verbose (shows each entity/relationship operation):
1 50 wal-00000001.pxw
+ entity [host] web-01 (id: a1b2c3...)
+ entity [host] web-02 (id: d4e5f6...)
+ rel [RUNS] a1b2c3... → 789abc...
- entity id=deadbeef...
parallax version
Print version information.
parallax version
Output:
parallax 0.1.0
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
1 | General error (engine open failure, query error) |
2 | Invalid arguments |
Logging
Set RUST_LOG to control log verbosity, and --log-format for the output format:
# Debug all parallax modules (human-readable)
RUST_LOG=parallax=debug parallax serve
# Only warnings and errors
RUST_LOG=warn parallax serve
# JSON structured logs (for log aggregators)
parallax --log-format json serve
# JSON + custom verbosity
RUST_LOG=info parallax --log-format json serve
Performance Targets
v0.1 Targets
These are the benchmarked targets for v0.1 on modern NVMe hardware:
| Operation | Target p99 | Notes |
|---|---|---|
| Entity lookup by ID (MemTable) | ≤1μs | BTreeMap::get |
| Entity lookup by ID (Segment) | ≤100μs | Linear scan; improves with index in v0.2 |
| Type index scan (1K entities) | ≤500μs | Iterate type index, fetch entities |
| Single-hop traversal (degree ≤100) | ≤500μs | Adjacency index lookup |
| Multi-hop traversal (depth 3, degree 5) | ≤5ms | BFS with visited set |
| Shortest path (10K-node graph) | ≤10ms | Bidirectional BFS |
| Blast radius (depth 4, 1K impacted) | ≤10ms | BFS from origin |
| WAL write throughput | ≥500K ops/sec | Batched, single fsync |
| PQL parse + plan | ≤1ms | Hand-written recursive descent |
| Snapshot acquisition | ≤1μs | Arc::clone |
| Query execution (count, 100K entities) | ≤50ms | Type index + filter |
Benchmarking
Run the benchmark suite:
# Storage engine benchmarks
cargo bench --package parallax-store
# Graph engine benchmarks
cargo bench --package parallax-graph
# Query engine benchmarks
cargo bench --package parallax-query
What Affects Performance
MemTable vs. Segment Reads
Entities in the MemTable (recently written) are served in ≤1μs. Entities that have been flushed to segment files require a linear scan of the segment, adding ~100μs per entity. The segment index (v0.2) will reduce this to ~1μs.
Implication: If your workload has mostly recent data (e.g., fresh ingest followed by queries), performance is excellent. If your graph is heavily segmented, upgrade to v0.2 for segment indexing.
Traversal Depth and Degree
Traversal complexity is O(b^d) where b is the average branching factor and d is the depth. With BFS visited-set deduplication:
- Depth 2, degree 10: 100 entities visited
- Depth 4, degree 10: 10,000 entities visited
- Depth 4, degree 100: 100,000,000 entities — exceeds practical limits
Use max_depth to bound traversals. Set edge_classes to filter edge types
and reduce the branching factor.
Lock Contention
The Arc<Mutex<StorageEngine>> lock is held for:
- ~5μs per entity during diff computation
- ~10μs for WAL fsync + MemTable update
Multiple concurrent syncs queue at the write step but diff in parallel. For more than 10 concurrent connectors syncing simultaneously, expect write latency to increase. WAL group commit (v0.2) will amortize this.
Memory Consumption
Rule of thumb: ~1KB per entity in MemTable (struct + property strings on heap). For 100K entities, expect ~100MB MemTable size before flush.
The flush threshold (default: 64MB) triggers a segment flush, freeing MemTable
memory. Tune memtable_flush_size based on your available memory.
Scaling to 1M+ Entities
v0.1 is benchmarked and correct at 100K entities. For 1M+ entities:
- Segment indexing (v0.2): reduces lookup from O(n_segment) to O(log n)
- Compaction (v0.2): reclaims space from deleted entities
- WAL group commit (v0.2): 10-100× write throughput improvement
The architecture supports 10M+ entities with the v0.2 improvements. The single-writer model remains correct at any scale — it simply processes writes faster with group commit.
Roadmap
v0.1 delivered the complete vertical slice: ingest → store → graph → query → serve. v0.2 focused on performance, operability, and the policy engine.
v0.2 — Completed
Performance
| Feature | Status | Notes |
|---|---|---|
| WAL group commit | ✅ Done | append_batch() — single fsync per batch |
| Segment sparse index | ✅ Done | Binary-search index, O(log n) lookups |
| Background compaction | ✅ Done | CompactionWorker thread, merge small segments |
| Property index | Deferred | Secondary index deferred to v0.3 |
Query Language
| Feature | Status | Notes |
|---|---|---|
| OR in filters | ✅ Done | WITH state = 'a' OR state = 'b' |
| NOT EXISTS | ✅ Done | WITH NOT owner EXISTS |
| GROUP BY | ✅ Done | FIND host GROUP BY os → QueryResult::Grouped |
| Field projection | Deferred | RETURN field1, field2 parses but returns full entities |
| Parameterized queries | Deferred |
Connector SDK
| Feature | Status | Notes |
|---|---|---|
| Parallel step execution | ✅ Done | topological_waves() + JoinSet per wave |
| WASM connector sandbox | Deferred | |
| Connector config schema | Deferred |
Policy Engine
| Feature | Status | Notes |
|---|---|---|
| YAML rule files | ✅ Done | load_rules_from_yaml(), serde_yaml |
| Policy REST API | ✅ Done | GET/POST /v1/policies, POST /v1/policies/evaluate, GET /v1/policies/posture |
| Parallel evaluation | ✅ Done | par_evaluate_all() via std::thread::scope |
| Scheduled evaluation | Deferred |
Observability
| Feature | Status | Notes |
|---|---|---|
| JSON log format | ✅ Done | --log-format json global CLI flag |
parallax wal dump | ✅ Done | parallax wal dump [--verbose] |
| Rich Prometheus metrics | Partial | Basic counters; histograms deferred |
| OpenTelemetry traces | Deferred |
v0.3 — Planned
- Property secondary index (fast
WITH state=Xwithout full scan) - Field projection (
RETURN display_name, state) - Parameterized queries (
FIND host WITH state = $1) - Scheduled policy evaluation (cron-based)
- gRPC via tonic
- First-party connectors:
connector-aws,connector-okta,connector-github
Known Limitations
-
Field projection parses but is not enforced:
FIND host RETURN display_nameparses without error but still returns full entity objects. Projection requires an architectural change to the entity return type. -
Single-node only: No replication, no clustering. The storage format is designed to support replication but it is not implemented.
-
No gRPC: Only REST. gRPC is architecturally planned but not implemented.
-
Soft class/verb enforcement: Unknown entity classes and relationship verbs produce warnings, not hard errors.
Known Deferred Items
crates/parallax-query/src/executor.rs— field projection (RETURNclause returns full entities, not projected fields)
Versioning Policy
- v0.x: Breaking changes between minor versions are allowed.
- v1.0: Stable public API. Breaking changes require a major version bump.
- MSRV: Latest stable Rust minus 2 releases.
The PQL language syntax and the entity/relationship schema are treated as public API even before v1.0 — changes go through a deprecation cycle.