Parallax

Parallax

"Depth perception for your attack surface."

Parallax is an open-source, Rust-native typed property graph engine built for cyber asset intelligence. It is the infrastructure layer that answers one question:

Given everything we know about our assets, what is connected to what — and what does that imply?

What It Is

Parallax is to security asset data what SQLite is to relational data: embeddable, zero-external-dependency, and correct enough to trust.

It solves the problem that every security team faces: ~400,000 assets spread across ~70 tools, each tool seeing its own slice with no visibility into the relationships between them. A graph model — entities as nodes, relationships as directed edges — is the right abstraction. Parallax makes that graph infrastructure open, embeddable, and fast.

What It Is Not

  • Not a scanner. It consumes telemetry; it does not generate it.
  • Not a SIEM. It does not process event streams or alert on logs.
  • Not a UI product. It is a library and a server; UIs are built on top.
  • Not Neo4j. It is a domain-specific graph engine, not a general-purpose graph database.

The Stack

parallax-cli          ← Command-line interface
parallax-server       ← REST HTTP server (Axum)
parallax-connect      ← Integration SDK (Connector trait + scheduler)
parallax-ingest       ← Sync protocol (diff + atomic commit)
parallax-policy       ← Policy evaluation engine
parallax-query        ← PQL parser, planner, executor
parallax-graph        ← Graph traversal, path-finding, blast radius
parallax-store        ← Storage engine (WAL + MemTable + Segments + MVCC)
parallax-core         ← Shared types (Entity, Relationship, Value, Timestamp)

Dependencies flow strictly downward. No cycles. The compiler enforces this.

Version

This documentation covers Parallax v0.1 — the first working vertical slice:

  • Ingest via REST API or connector SDK
  • Durable storage with WAL + MemTable + Segments
  • Graph traversal, shortest path, blast radius, coverage gap analysis
  • PQL query language (FIND / WITH / THAT / RETURN / LIMIT)
  • Policy evaluation with posture scoring
  • REST API with authentication and Prometheus metrics
  • CLI for query, stats, and serve

Quick Start

# Build
cargo build --release --package parallax-cli

# Start the server (no auth key = open mode)
./target/release/parallax serve --data-dir /var/lib/parallax

# Ingest some entities
curl -X POST http://localhost:7700/v1/ingest/sync \
  -H 'Content-Type: application/json' \
  -d '{
    "connector_id": "my-connector",
    "sync_id": "sync-001",
    "entities": [
      {"entity_type": "host", "entity_key": "web-01", "entity_class": "Host",
       "display_name": "Web Server 01", "properties": {"state": "running"}},
      {"entity_type": "service", "entity_key": "nginx", "entity_class": "Service",
       "display_name": "Nginx"}
    ],
    "relationships": [
      {"from_type": "host", "from_key": "web-01", "verb": "RUNS",
       "to_type": "service", "to_key": "nginx"}
    ]
  }'

# Query with PQL
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND host WITH state = '\''running'\''"}'

Reading This Book

Start with Design Principles to understand the "why" behind every decision. Then read Data Model — everything else depends on it.

After that, read in whatever order matches your interest:

Architecture Overview

Parallax is organized as a Cargo workspace of nine crates arranged in a strictly acyclic dependency chain. Each crate has a single responsibility, a minimal public API, and explicit dependencies.

The Nine Crates

CrateResponsibility
parallax-coreShared types and error definitions. Zero external deps beyond serde and blake3.
parallax-storeDurable storage: WAL, MemTable, immutable Segments, MVCC snapshots.
parallax-graphGraph reasoning: traversal, pattern matching, shortest path, blast radius.
parallax-queryPQL parse → plan → execute pipeline.
parallax-policyPolicy rule evaluation and posture scoring.
parallax-ingestSource-scoped sync protocol: diff, validate, atomic commit.
parallax-connectIntegration SDK: Connector trait, step scheduler, entity/relationship builders.
parallax-serverREST HTTP server with authentication, request-ID middleware, Prometheus metrics.
parallax-cliCommand-line binary: serve, query, stats, version.

Dependency Graph

parallax-core
    │
    ├──► parallax-store
    │        │
    │        ├──► parallax-graph
    │        │        │
    │        │        ├──► parallax-query
    │        │        └──► parallax-policy
    │        │
    │        └──► parallax-ingest
    │                 │
    │                 └──► parallax-connect
    │
parallax-server  (depends on: core, store, graph, query, policy, ingest, connect)
    │
    └──► parallax-cli

The rule is absolute: No crate may depend on a crate above it in this graph. The Rust module system enforces this at compile time.

Data Flow

A typical request through the full stack:

External Source
      │
      ▼
[REST POST /v1/ingest/sync]    ← parallax-server validates auth, parses JSON
      │
      ▼
[validate_sync_batch()]         ← parallax-ingest checks referential integrity
      │                                          and class/verb constraints
      ▼
[SyncEngine::commit_sync()]     ← parallax-ingest diffs against current snapshot,
      │                                           builds WriteBatch
      ▼
[StorageEngine::write(batch)]   ← parallax-store appends WAL entry, updates
      │                                           MemTable, publishes new snapshot
      ▼
[Snapshot published]            ← All subsequent reads see new state

      ...later...

[REST POST /v1/query]           ← parallax-server
      │
      ▼
[parse(pql)]                    ← parallax-query lexer + recursive descent parser
      │
      ▼
[plan(ast, &stats)]             ← parallax-query planner chooses index strategy
      │
      ▼
[execute(plan, &snapshot)]      ← parallax-query executor calls parallax-graph
      │
      ▼
[GraphReader::find() / traverse()]  ← parallax-graph reads from MVCC snapshot
      │
      ▼
[JSON response]

Key Architectural Decisions

Single-Writer, Multi-Reader

All mutations flow through one serialized write path. This eliminates write-write conflicts, deadlocks, and lock ordering bugs. Readers operate on MVCC snapshots with zero coordination — a reader never blocks a writer.

Owned Storage Engine

No external storage dependency (no Neo4j, no RocksDB, no sled). Parallax owns its storage format, which means:

  • Embeddable in a CLI without a separate daemon
  • Deterministic latency without JVM GC pauses
  • Full control over the read/write path

Deterministic Entity Identity

Entity IDs are 128-bit blake3 hashes of (account_id, entity_type, entity_key). The same logical entity always gets the same ID, across all time and all connectors. This enables idempotent re-ingestion and conflict-free merging.

Hybrid Logical Clocks

Even in single-node mode, Parallax uses HLC timestamps rather than wall clocks. Wall clocks can go backwards (NTP adjustments); two events in the same millisecond need deterministic ordering; and HLC extends naturally when clustering is added.

Design Principles

Parallax's architecture is shaped by seven principles drawn from the systems thinkers whose work directly influenced each layer.

Principle 1: Interfaces Are Forever

Lampson: "Implementation can change; interfaces cannot."

The entity/relationship schema, the query language (PQL), and the integration SDK are public contracts. They are versioned, documented, and maintained with the assumption that downstream consumers depend on them.

Internal storage format, memory layout, concurrency strategy — these are implementation secrets that can change every release without notice.

What this means in practice:

  • EntityId, Entity, Relationship, Value, and PropertyMap in parallax-core are stable API. Breaking changes require a major version bump.
  • The WAL on-disk format (PXWA magic + postcard framing) can change between releases as long as the storage engine handles migration transparently.
  • PQL syntax is a public contract. Adding new keywords is allowed; removing or changing semantics of existing keywords requires a deprecation cycle.

Principle 2: Ownership Is Architecture

Matsakis: "Good abstractions hide complexity."

Every datum in Parallax has exactly one owner at any point in time:

  • The graph store owns entities and relationships.
  • A transaction borrows the store mutably for writes; a snapshot borrows it immutably for reads.
  • A snapshot is a frozen, immutable view that readers hold cheaply via Arc::clone.
  • A connector owns its ingestion state. The engine never reaches into it.

If you cannot draw the ownership graph of a component on a whiteboard, the component is too complex.

In code: GraphReader<'snap> ties every returned reference to the snapshot's lifetime via Rust's borrow checker. No use-after-free. No stale reads. No cloning on the read path.

Principle 3: Single Writer, Many Readers

Bos: "Less sharing means fewer bugs."

All mutations flow through a single serialized write path. This eliminates write-write conflicts, deadlocks, and lock ordering bugs.

Readers operate on MVCC snapshots with zero coordination. A reader never blocks a writer. A writer never waits for a reader.

The numbers: A single writer on modern NVMe processes 500K+ key-value mutations per second after WAL batching. The largest enterprise asset graphs ingest at ~10K entities/sec. We have 50× headroom.

Principle 4: Separate Normal and Worst Case

Lampson: "Optimize for the common case; handle edge cases separately."

Normal case: Ingest a batch of 50–500 entity upserts from a connector. Execute a graph query over 10K–100K entities. Evaluate a policy rule. This path is optimized for latency and throughput.

Worst case: Full re-sync of 2M assets from a new connector. WAL recovery after crash. Compaction of stale segment files. These paths are optimized for correctness and progress, not speed.

The normal and worst case paths may share zero code. That is acceptable.

Principle 5: Correctness Is Non-Negotiable

Lamport: "If you can't specify it precisely, you don't understand it."

Every state machine — transaction lifecycle, sync protocol, compaction — has written invariants before implementation. See the Invariant Reference for the full list.

Safety properties (must never happen):

  • A committed write is never lost.
  • A read snapshot never observes a partial write.
  • An entity ID is never reused for a different logical entity.
  • A relationship never references a non-existent entity in a committed state.

Liveness properties (must eventually happen):

  • A submitted write batch is eventually committed or rejected.
  • A compaction cycle eventually reclaims space from deleted versions.
  • A connector sync eventually converges to the source-of-truth state.

Principle 6: Make Illegal States Unrepresentable

Turon: "The best API is one where the obvious thing to do is the right thing."

If an entity cannot exist without a _type and _class, the Rust type system enforces that — not a runtime validator.

If a relationship requires two valid entity references, the write path enforces referential integrity at commit time (INV-03, INV-04).

If a query cursor cannot outlive its snapshot, the borrow checker enforces it.

In practice:

  • EntityClass::new(s) returns Result — unknown classes are rejected at the API boundary, not silently accepted.
  • EntityId::derive() takes (account_id, entity_type, entity_key) and produces a deterministic ID. There is no EntityId::random().
  • WriteBatch is an opaque builder — you cannot construct a batch that references a non-existent entity. Validation runs at ingest time.

Principle 7: Observability as a Requirement

Not optional. Not "later." Now.

Every subsystem exposes:

  • Metrics: operation counts, latencies (p50/p99/max), queue depths — served at GET /metrics in Prometheus text format.
  • Structured logs: every write batch commit, every sync cycle, every error — via tracing with configurable log levels.
  • Request IDs: every HTTP request gets a X-Request-Id header (UUID v4) that is propagated through the stack for distributed tracing.

The engine must be debuggable in production without recompilation.

Crate Dependency Map

Workspace Layout

parallax/
├── Cargo.toml                     # Workspace root
├── crates/
│   ├── parallax-core/             # Shared types, zero external deps
│   ├── parallax-store/            # Storage engine (WAL + MemTable + Segments)
│   ├── parallax-graph/            # Graph traversal and reasoning
│   ├── parallax-query/            # PQL parse + plan + execute
│   ├── parallax-policy/           # Policy rule evaluation
│   ├── parallax-ingest/           # Sync protocol (diff + commit)
│   ├── parallax-connect/          # Integration SDK
│   ├── parallax-server/           # REST HTTP server
│   └── parallax-cli/              # CLI binary
└── specs/                         # Architectural specifications

Dependency Graph

parallax-core          (no workspace deps; blake3, serde, compact_str, thiserror)
       │
       ├──► parallax-store     (core; arc-swap, crc32c, lz4_flex, postcard, tracing)
       │           │
       │           ├──► parallax-graph     (core, store)
       │           │           │
       │           │           ├──► parallax-query      (core, graph)
       │           │           └──► parallax-policy     (core, graph)
       │           │
       │           └──► parallax-ingest    (core, store)
       │                       │
       │                       └──► parallax-connect    (core, ingest)
       │
parallax-server  (core, store, graph, query, policy, ingest, connect;
       │          axum, tower, tower-http, tokio, serde_json, uuid, compact_str)
       │
parallax-cli     (server; clap, anyhow, tokio)

External Dependencies by Crate

parallax-core

CrateVersionPurpose
serde1Serialization derive macros
compact_str0.8Small-string optimization (entity types/classes are short)
blake31Deterministic ID hashing (SIMD-accelerated)
thiserror1Error derive macros

parallax-store

CrateVersionPurpose
arc-swap1Lock-free atomic Arc swap for snapshot publishing
crc32c0.6WAL entry checksums
lz4_flex0.11Block compression for segment files
postcard1Compact binary serialization for WAL + Segments
tracing0.1Structured logging
tempfile3(dev) Temporary directories for tests

parallax-graph

CratePurpose
tracingStructured logging

parallax-query

CratePurpose
tracingStructured logging

parallax-policy

CratePurpose
serdePolicy rule serialization
tracingStructured logging

parallax-ingest

CratePurpose
tracingStructured logging
tempfile(dev) Temporary directories for tests

parallax-connect

CratePurpose
async-traitAsync trait support for the Connector trait
serdeBuilder serialization
tokioAsync runtime
tracingStructured logging
compact_strSmall-string optimization
thiserrorError types

parallax-server

CratePurpose
axumHTTP/REST server framework
towerMiddleware stack
tower-httpHTTP middleware (trace, request-id, sensitive-headers)
tokioAsync runtime
serde_jsonJSON serialization for REST responses
uuidUUID v4 for request IDs
compact_strSmall-string optimization
tracingStructured logging
thiserrorError types

parallax-cli

CratePurpose
clapCommand-line argument parsing
anyhowError handling in binary context
tokioAsync runtime
serde_jsonJSON output formatting

What We Explicitly Do Not Use

CrateWhy Not
rocksdbWe own storage. No C++ FFI dependency.
diesel / sqlxNo SQL database.
neo4j-*The whole point is to not depend on Neo4j.
sledCorrectness concerns in early versions.
tonic / prostgRPC deferred to v0.2; REST-only for v0.1.

Adding a New External Dependency

Before adding a new crate:

  1. Is there a std equivalent? Prefer std.
  2. Does an existing approved crate already cover this? Reuse it.
  3. Is the crate well-maintained, zero-unsound-unsafe, and Apache/MIT licensed?
  4. Does adding it violate the acyclic dep constraint?

Add a justification comment in Cargo.toml for non-obvious dependencies.

Data Model

Parallax models the world as a typed property graph:

  • Entities are nodes. Each has a type, a class, and a bag of flat properties.
  • Relationships are directed edges. Each has a verb (class), connects two entities, and may carry properties.
  • Properties are flat key-value pairs. No nesting. If you need structure, model it as another entity and a relationship.

Entity Identity

Design Goals

An entity ID must be:

  1. Stable — the same logical entity always gets the same ID.
  2. Deterministic — given the same inputs, we always produce the same ID.
  3. Collision-free — two different entities never share an ID.
  4. Source-independent — the ID doesn't change if we re-ingest from a different connector.

Implementation

#![allow(unused)]
fn main() {
/// EntityId is a 128-bit blake3 hash of (account_id, entity_type, entity_key).
/// It is never randomly generated.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct EntityId(pub [u8; 16]);

impl EntityId {
    pub fn derive(account_id: &str, entity_type: &str, entity_key: &str) -> Self {
        let mut hasher = blake3::Hasher::new();
        hasher.update(account_id.as_bytes());
        hasher.update(b":");
        hasher.update(entity_type.as_bytes());
        hasher.update(b":");
        hasher.update(entity_key.as_bytes());
        let hash = hasher.finalize();
        let mut id = [0u8; 16];
        id.copy_from_slice(&hash.as_bytes()[..16]);
        EntityId(id)
    }
}
}

Why blake3: Fast (SIMD-accelerated), cryptographically strong, deterministic, no seed needed. 128-bit truncation gives collision probability of ~1 in 10^18 at 10 billion entities.

Why not UUID v4: UUIDs are random and not deterministic from source data. We need idempotent re-ingestion — inserting the same entity twice must produce the same ID, not a duplicate.

Entity Structure

#![allow(unused)]
fn main() {
pub struct Entity {
    /// Deterministic, content-addressed identity.
    pub id: EntityId,

    /// Specific type: "aws_ec2_instance", "okta_user", "github_repo".
    /// Open set — connectors define new types freely.
    pub _type: EntityType,

    /// Broad classification: "Host", "User", "DataStore".
    /// Closed set of ~40 values. Enables cross-type queries.
    pub _class: EntityClass,

    /// Human-readable display name. Not unique, not stable.
    pub display_name: CompactString,

    /// Which connector + sync cycle produced this version.
    pub source: SourceTag,

    /// When this entity was first observed by Parallax.
    pub created_at: Timestamp,

    /// When this entity was last modified.
    pub updated_at: Timestamp,

    /// Soft-delete flag. Set by sync diff when entity disappears from source.
    pub _deleted: bool,

    /// Flat key-value property bag.
    pub properties: PropertyMap,
}
}

Type vs. Class: The Two-Level Hierarchy

Class (broad, ~40 total)     Type (specific, unbounded)
─────────────────────────    ───────────────────────────────
Host                         aws_ec2_instance, azure_vm, host
User                         okta_user, aws_iam_user, user
DataStore                    aws_s3_bucket, aws_dynamodb_table
CodeRepo                     github_repo, gitlab_project
Firewall                     aws_security_group, gcp_firewall_rule
Service                      service, nginx, microservice

Class enables generic queries across cloud providers: FIND Host WITH active = true

Type enables specific queries: FIND aws_ec2_instance WITH instanceType = 'm5.xlarge'

Class is a closed set (defined by Parallax, ~40 values). Type is an open set (defined by connectors, unlimited). See the full class list in Known Entity Classes.

Relationship Structure

#![allow(unused)]
fn main() {
pub struct Relationship {
    /// Deterministic identity derived from (from_id, class, to_id).
    pub id: RelationshipId,

    /// The verb: HAS, RUNS, CONTAINS, ALLOWS, etc.
    pub _class: RelationshipClass,

    /// Source entity. Must exist at commit time.
    pub from_id: EntityId,

    /// Target entity. Must exist at commit time.
    pub to_id: EntityId,

    /// Optional properties on the relationship edge.
    pub properties: PropertyMap,

    pub source: SourceTag,
    pub _deleted: bool,
}
}

Referential integrity (INV-03): The from_id and to_id of every committed relationship must reference entities that exist in the graph. The ingest layer enforces this at commit time — dangling relationships are rejected, not silently stored.

Property Values

Properties are flat key-value pairs with a small set of value types:

#![allow(unused)]
fn main() {
pub enum Value {
    Null,
    Bool(bool),
    Int(i64),
    Float(f64),
    String(CompactString),
    Timestamp(Timestamp),
    StringArray(Vec<CompactString>),
}

pub type PropertyMap = BTreeMap<CompactString, Value>;
}

Why no nested objects: Flat properties are indexable, queryable, and diff cleanly (key-by-key comparison). JSON path expressions inside a graph query create a second query language inside the first. If your source data has nested objects, either flatten them into properties, or model the structure as entities and relationships.

Source Tracking

Every entity and relationship knows which connector produced it:

#![allow(unused)]
fn main() {
pub struct SourceTag {
    /// Connector identifier: "connector-aws", "my-scanner", etc.
    pub connector_id: CompactString,
    /// Unique ID for this specific sync execution.
    pub sync_id: CompactString,
    /// When this sync started (HLC timestamp).
    pub sync_timestamp: Timestamp,
}
}

This enables:

  1. Differential sync: Delete entities from connector X that weren't seen in the latest sync — without touching entities from connector Y.
  2. Provenance: Know which connector is the authority for each entity.

Hybrid Logical Clocks

#![allow(unused)]
fn main() {
pub struct Timestamp {
    /// Milliseconds since Unix epoch (wall clock component).
    pub wall_ms: u64,
    /// Logical counter — breaks ties when wall_ms is equal.
    pub logical: u32,
    /// Node identifier (0 for single-node; reserved for clustering).
    pub node_id: u16,
}
}

HLC is used instead of wall-clock SystemTime because:

  • Wall clocks can go backwards (NTP adjustments).
  • Two events in the same millisecond need deterministic ordering.
  • HLC extends naturally when clustering is added.

Invariants

INV-01: Every entity has a non-empty _type, _class, and entity_key.
INV-02: EntityId is deterministic: same (account, type, key) → same id.
INV-03: Every relationship's from_id and to_id reference existing entities.
INV-04: No two entities in the same account share (type, key).
INV-05: No two relationships share (from_id, class, to_id) unless
        explicitly keyed with derive_with_key.
INV-06: Timestamps are monotonically increasing per node.
INV-07: Property types are stable within an entity type across versions.

Concurrency Model

Parallax uses a single-writer, multi-reader model with MVCC snapshots. This is the most important architectural decision in the codebase.

The Single-Writer Invariant

All mutations to the graph flow through a single serialized write path. At any point in time, at most one WriteBatch is being committed.

What this eliminates:

  • Write-write conflicts
  • Deadlocks
  • Lock ordering bugs
  • Non-deterministic mutation ordering

What this does not prevent:

  • Multiple concurrent readers (unlimited, lock-free)
  • Multiple concurrent connector syncs (they queue at the ingest layer)

MVCC Snapshots

                    ┌─────────────────────┐
                    │    Writer Path       │
                    │  (single, serial)    │
                    │                      │
  WriteBatch ──────►│  1. Validate         │
  WriteBatch ──────►│  2. WAL append+fsync │
  WriteBatch ──────►│  3. Apply to MemTable│
                    │  4. Update indices   │
                    │  5. Publish snapshot  │
                    └──────────┬───────────┘
                               │ ArcSwap::store (atomic)
                    ┌──────────▼───────────┐
                    │   SnapshotManager     │
                    │                      │
                    │  current: Arc<Snap>  │
                    └──┬─────┬─────┬───────┘
                       │     │     │  Arc::clone (one atomic increment)
                    ┌──▼─┐┌──▼─┐┌──▼──┐
                    │ R1 ││ R2 ││ R3  │  Reader threads (unlimited)
                    └────┘└────┘└─────┘

A snapshot is an immutable view of the graph at a point in time. Readers acquire a snapshot with Arc::clone — one atomic increment, no lock. They hold it as long as needed without blocking writes.

When the writer commits a new batch, it atomically publishes a new snapshot via arc-swap. Existing readers keep their old snapshot until they drop it. The old snapshot's memory is freed when all readers release their Arc.

Shared State Inventory

Shared StateAccessed BySynchronization
current_snapshotWriter (store), Readers (load)arc-swap — lock-free atomic pointer
WAL fileWriter onlySingle owner, no sync needed
MemTableWriter (mut), Snapshot (immutable ref)Writer publishes new snapshot; never mutated after publish
Segment inventoryWriter (during compaction), Readers (via snapshot)Snapshots hold Arc<Vec<SegmentRef>>; writer builds new Vec
Metrics countersWriter + ReadersRelaxed atomics (counters only)

Key insight: The only truly shared mutable state is the snapshot pointer. Everything else is either single-owner (WAL, MemTable before publish) or immutable-after-publish (snapshots, segments).

The SyncEngine Lock Pattern

The REST server shares the storage engine across concurrent HTTP handlers via Arc<Mutex<StorageEngine>>:

#![allow(unused)]
fn main() {
pub struct AppState {
    pub engine: Arc<Mutex<StorageEngine>>,
    pub sync: SyncEngine,       // wraps the same Arc<Mutex<StorageEngine>>
    ...
}
}

The ingest path follows this protocol to minimize lock-hold time:

#![allow(unused)]
fn main() {
// 1. Lock briefly to take a snapshot for diff computation.
let (existing_entities, existing_rels) = {
    let engine = self.store.lock().expect("engine lock");
    let snap = engine.snapshot();
    validate_sync_batch(&entities, &relationships, &snap)?;
    let ents = snap.entities_by_source(connector_id)...collect();
    let rels = snap.relationships_by_source(connector_id)...collect();
    (ents, rels)  // snap dropped here, lock released
};

// 2. Compute diff without holding any lock (pure CPU work).
let mut batch = WriteBatch::new();
// ... diff logic ...

// 3. Lock briefly again to commit the batch.
if !batch.is_empty() {
    let mut engine = self.store.lock().expect("engine lock");
    engine.write(batch)?;
}
}

This pattern ensures the lock is held for microseconds, not milliseconds. Multiple connectors can prepare their diffs concurrently; they only contend at the write step.

Async Compatibility

GraphReader<'snap> borrows its snapshot and cannot cross await points because Rust's borrow checker prevents holding non-Send borrows across awaits. The correct pattern for async handlers:

#![allow(unused)]
fn main() {
async fn query_handler(state: State<AppState>) -> Json<Value> {
    // Acquire engine lock, take snapshot, compute result, drop snapshot.
    // All synchronous — no await while holding the borrow.
    let result = {
        let engine = state.engine.lock().unwrap();
        let snap = engine.snapshot();
        let graph = GraphReader::new(&snap);
        graph.find("host").collect::<Vec<_>>()
            .into_iter().map(entity_to_json).collect::<Vec<_>>()
        // snap drops here, lock released
    };
    Json(json!({ "entities": result }))
}
}

Performance Characteristics

With this model on modern NVMe hardware:

OperationThroughputNotes
Snapshot acquisition~1nsArc::clone = one atomic increment
Entity lookup (MemTable)≤1μs p99BTreeMap lookup
Entity lookup (Segment)≤100μs p99Linear scan; will improve with index in v0.2
WAL write throughput≥500K ops/secWith batching
Write lock contention<5μs typicalLock held only for MemTable update

Invariant Reference

Parallax formalizes correctness as numbered invariants across all specs. Every code change should be checked against the invariants it might affect. Invariants are referenced in commit messages using their codes.

Data Model Invariants (INV-01..08)

From specs/01-data-model.md:

CodeInvariant
INV-01Every entity has a non-empty _type, _class, and entity key.
INV-02EntityId is deterministic: same (account_id, type, key) always produces the same ID.
INV-03Every relationship's from_id and to_id reference entities that exist in the committed graph.
INV-04No two entities in the same account share (type, key).
INV-05No two relationships share (from_id, class, to_id) unless explicitly keyed with derive_with_key.
INV-06Timestamps are monotonically increasing per node.
INV-07Property types are stable within an entity type across versions (no type changes).
INV-08Property values are flat — no nested objects or arrays-of-objects.

Storage Engine Invariants (INV-S01..S08)

From specs/02-storage-engine.md:

CodeInvariant
INV-S01A committed write is never lost, even after a crash.
INV-S02A read snapshot never observes a partial write.
INV-S03Snapshots are monotonically increasing — a newer snapshot always supersedes an older one.
INV-S04WAL entries are append-only and immutable after write.
INV-S05A WAL entry with a CRC mismatch is treated as corrupt; recovery stops at the last valid entry.
INV-S06MemTable flush is atomic: either all entries move to a segment or none do.
INV-S07Segment files are immutable after creation. Compaction produces new segments, never modifies old ones.
INV-S08An entity with _deleted = true must never appear in query results.

Graph Engine Invariants (INV-G01..G06)

From specs/03-graph-engine.md:

CodeInvariant
INV-G01GraphReader<'snap> references cannot outlive their snapshot (enforced by the borrow checker).
INV-G02Traversal never follows edges to deleted entities.
INV-G03BFS traversal visits each entity at most once (no infinite loops in cyclic graphs).
INV-G04Shortest path returns None when no path exists; it never returns an incorrect path.
INV-G05Blast radius computation is bounded by max_depth (default: 4).
INV-G06Coverage gap analysis only returns entities of target_type that have no qualifying neighbor.

Query Language Invariants (INV-Q01..Q06)

From specs/04-query-language.md:

CodeInvariant
INV-Q01PQL parsing is deterministic: the same query string always produces the same AST.
INV-Q02A query never returns results from entities that do not satisfy all specified filters.
INV-Q03LIMIT n applied to a query returns at most n results.
INV-Q04A query that times out returns an error, not a partial result.
INV-Q05FIND SHORTEST PATH FROM A TO B returns the minimum-hop path or None; never a longer path.
INV-Q06FIND BLAST RADIUS FROM X DEPTH n returns only entities reachable within n hops.

Connector SDK Invariants (INV-C01..C06)

From specs/05-integration-sdk.md:

CodeInvariant
INV-C01A sync commit is atomic: either all entities/relationships land or none do.
INV-C02Entities from connector A are never deleted by a sync from connector B.
INV-C03Entity IDs are deterministic from (account_id, type, key) — same as INV-02.
INV-C04A relationship in a sync batch whose from_id or to_id does not exist (in batch or graph) is rejected.
INV-C05Step dependencies form a DAG — circular step dependencies are rejected at connector load time.
INV-C06A failed step does not prevent independent steps from running.

API Surface Invariants (INV-A01..A06)

From specs/06-api-surface.md:

CodeInvariant
INV-A01All write endpoints require authentication when an API key is configured.
INV-A02API key comparison uses constant-time equality to prevent timing attacks.
INV-A03Every committed write is visible to subsequent reads on the same server instance.
INV-A04Query responses are paginated; no single response exceeds the configured max_results limit.
INV-A05Every request has a X-Request-Id header (generated or propagated) for tracing.
INV-A06The /v1/health endpoint is exempt from authentication.

Policy Engine Invariants (INV-P01..P06)

From specs/08-policy-engine.md:

CodeInvariant
INV-P01A policy rule with an invalid PQL query is rejected at load time, not at evaluation time.
INV-P02Policy evaluation never modifies the graph. It is read-only.
INV-P03A rule that errors during evaluation is recorded as an error, not as a pass or fail.
INV-P04Posture score is computed from all loaded rules, including errored ones (they count as failures).
INV-P05Framework mapping (CIS, NIST, PCI-DSS) is metadata on rules; it does not affect evaluation logic.
INV-P06Policy rules are validated against the PQL parser before being accepted into the rule set.

How to Use This Reference

When modifying code, ask: Which invariants does this change touch?

Reference affected invariants in your commit message:

fix(ingest): reject dangling relationships at commit time (INV-C04, INV-03)
feat(store): implement WAL CRC32C verification (INV-S05)
refactor(graph): bound traversal depth to prevent unbounded BFS (INV-G03)

When adding new behavior, ask: Does this require a new invariant? If yes, add it to the appropriate spec file and to this reference page.

Storage Engine Overview

The parallax-store crate is the durable foundation of the entire system. It provides:

  1. Durability — entities and relationships are written to a WAL before being applied to memory.
  2. Point lookups — retrieve an entity by ID in ≤1μs from MemTable.
  3. MVCC snapshots — immutable, frozen views of the graph that readers hold without blocking writes.
  4. Compaction — (v0.2) background reclamation of space from deleted versions.

The storage engine does not understand graph semantics. It stores keyed records. The graph engine (parallax-graph) builds traversal and adjacency on top of the storage snapshot interface.

Architecture

StorageEngine
├── WriteAheadLog (WAL)
│   └── wal-00000000.pxw, wal-00000001.pxw, ...
│
├── MemTable (in-memory)
│   ├── entities: BTreeMap<EntityId, Entity>
│   ├── relationships: BTreeMap<RelationshipId, Relationship>
│   ├── type_index: HashMap<EntityType, Vec<EntityId>>
│   ├── class_index: HashMap<EntityClass, Vec<EntityId>>
│   ├── source_index: HashMap<ConnectorId, Vec<EntityId>>
│   └── adjacency: HashMap<EntityId, (Vec<RelId>, Vec<RelId>)>
│
├── Segments (on-disk, immutable)
│   └── seg-00000000.pxs, seg-00000001.pxs, ...
│
└── SnapshotManager
    └── current: ArcSwap<Snapshot>
        └── Snapshot { memtable_ref, segments: Arc<Vec<SegmentRef>> }

Write Path

Every mutation follows this sequence:

  1. Build a WriteBatch (set of upsert/delete operations)
  2. Serialize and append to WAL with CRC32C checksum
  3. fsync() — durability point; crash here loses nothing already committed
  4. Apply batch to MemTable
  5. Publish new snapshot via ArcSwap::store
#![allow(unused)]
fn main() {
let mut engine = StorageEngine::open(StoreConfig::new("/var/lib/parallax"))?;

let mut batch = WriteBatch::new();
batch.upsert_entity(entity);
batch.upsert_relationship(rel);

engine.write(batch)?;  // Steps 1-5 above
}

Read Path

Reads always go through a Snapshot:

#![allow(unused)]
fn main() {
let snap = engine.snapshot();  // Arc::clone — O(1), no lock
// Entity lookup: MemTable first, then segment scan
if let Some(entity) = snap.get_entity(entity_id) {
    println!("{}", entity.display_name);
}
// snap dropped here; frees the Arc
}

MemTable Flush

When the MemTable exceeds memtable_flush_size (default: 64MB), it is flushed to an immutable segment file:

  1. Serialize all entities and relationships to a .pxs segment file
  2. The new snapshot points to the fresh (empty) MemTable + the new segment
  3. The old MemTable data is freed

The adjacency index is preserved through flushes.

Crash Recovery

On StorageEngine::open(), if WAL segments exist:

  1. Replay WAL entries in order, verifying CRC32C on each
  2. Stop at the first corrupt entry (INV-S05)
  3. Apply all valid entries to rebuild the MemTable
  4. Publish the recovered snapshot

Recovery is the only code path that rebuilds MemTable from WAL. Normal operation never re-reads the WAL.

Storage Engine API

#![allow(unused)]
fn main() {
// Open or create a storage engine
let engine = StorageEngine::open(StoreConfig::new(data_dir))?;

// Write a batch (atomic, durable)
engine.write(batch)?;

// Get an MVCC snapshot (O(1))
let snap = engine.snapshot();

// Access entity counts (for stats/metrics)
let metrics = engine.metrics().snapshot();
}

See StorageEngine API for the full interface.

Write-Ahead Log (WAL)

The WAL is Parallax's durability guarantee. Every mutation is written to the WAL and fsync()'d before being applied to the MemTable. If the process crashes, the WAL replays to reconstruct the MemTable.

On-Disk Format

WAL data lives in {data_dir}/wal/. Each segment file is named wal-{index:08}.pxw (e.g., wal-00000000.pxw).

Each entry in a WAL file has this layout (little-endian):

┌──────────┬──────────┬──────────┬────────────┬──────────┐
│ magic(4) │ len(4)   │ seq(8)   │ payload(N) │ crc32(4) │
└──────────┴──────────┴──────────┴────────────┴──────────┘
FieldSizeDescription
magic4 bytes0x50585741 — ASCII "PXWA" (Parallax WAL)
len4 bytesTotal entry length including all fields
seq8 bytesMonotonic sequence number
payloadN bytesSerialized WriteBatch (postcard format)
crc32c4 bytesCRC32C of `(seq bytes

Write Protocol

#![allow(unused)]
fn main() {
pub fn append(&mut self, batch: &WriteBatch) -> Result<u64, WalError> {
    let seq = self.next_seq;
    self.next_seq += 1;

    // Serialize the batch with postcard (compact binary format)
    let payload = postcard::to_allocvec(batch)?;

    // Compute checksum over seq + payload
    let crc = crc32c_combine(seq.to_le_bytes(), &payload);

    // Rotate to a new segment if the active file exceeds MAX_SEGMENT_SIZE
    if self.active_size + entry_len > MAX_SEGMENT_SIZE {
        self.rotate()?;
    }

    // Write all fields
    self.active.write_all(&WAL_MAGIC)?;
    self.active.write_all(&(total_len as u32).to_le_bytes())?;
    self.active.write_all(&seq.to_le_bytes())?;
    self.active.write_all(&payload)?;
    self.active.write_all(&crc.to_le_bytes())?;

    // fsync — this is the durability commit point
    self.active.sync_data()?;

    Ok(seq)
}
}

After sync_data() returns, the entry survives any crash.

Crash Recovery

On startup, WriteAheadLog::recover() replays all WAL segments:

#![allow(unused)]
fn main() {
pub fn recover(&mut self) -> Result<Vec<WriteBatch>, WalError> {
    let mut batches = Vec::new();
    for segment_file in self.sorted_segment_files()? {
        let entries = self.read_segment(&segment_file)?;
        for entry in entries {
            // Verify magic
            if entry.magic != WAL_MAGIC { return Err(WalError::Corruption(...)); }
            // Verify checksum (INV-S05)
            if !verify_crc32c(entry.seq, &entry.payload, entry.crc32c) {
                // Stop at first corrupt entry
                tracing::warn!("WAL corruption detected at seq {}", entry.seq);
                break;
            }
            let batch = postcard::from_bytes(&entry.payload)?;
            batches.push(batch);
        }
    }
    Ok(batches)
}
}

INV-S05: A corrupt WAL entry stops recovery at that point. All entries before it are applied; the corrupt entry and everything after are discarded. This may result in losing the most recent write if the process crashed during fsync(). This is correct — the write was not acknowledged to the caller yet.

Segment Rotation

When the active WAL segment exceeds MAX_SEGMENT_SIZE (default: 64MB):

  1. sync_data() the active file
  2. Close the active file handle
  3. Open a new file: wal-{next_index:08}.pxw
  4. Set the new file as active

Old WAL segments are retained until after a successful MemTable flush to a .pxs segment. At that point, WAL segments covered by the segment are safe to delete.

Group Commit (v0.2)

For high-throughput ingestion, individual fsync() calls are expensive (~100μs on NVMe). Group commit batches multiple WriteBatches into a single fsync():

100 batches × 1 fsync = 100× write throughput
vs.
100 batches × 100 fsyncs = 100× latency overhead

Group commit is planned for v0.2. The current v0.1 implementation does one fsync() per WriteBatch, which is correct and suitable for all but the highest-throughput ingestion scenarios.

Inspecting WAL Files

WAL files are binary and not human-readable directly. Use the CLI to inspect the current state of the engine rather than reading WAL files:

parallax stats --data-dir /var/lib/parallax

A future parallax wal dump command is planned for v0.2 debugging.

MemTable

The MemTable is the in-memory write buffer. Every write is applied here immediately after the WAL fsync. Reads check the MemTable first, then fall back to segment files.

Structure

#![allow(unused)]
fn main() {
pub struct MemTable {
    // Primary storage
    entities:      BTreeMap<EntityId, Entity>,
    relationships: BTreeMap<RelationshipId, Relationship>,

    // Secondary indices (maintained in sync with primary)
    type_index:    HashMap<EntityType, Vec<EntityId>>,
    class_index:   HashMap<EntityClass, Vec<EntityId>>,
    source_index:  HashMap<CompactString, Vec<EntityId>>,   // connector_id → entities
    adjacency:     HashMap<EntityId, (Vec<RelationshipId>,  // outgoing
                                      Vec<RelationshipId>)>, // incoming
}
}

Operations

Upsert

#![allow(unused)]
fn main() {
pub fn upsert_entity(&mut self, entity: Entity) {
    let id = entity.id;
    let entity_type = entity._type.clone();
    let entity_class = entity._class.clone();
    let connector_id = entity.source.connector_id.clone();

    // Update primary store
    self.entities.insert(id, entity);

    // Update all secondary indices
    self.type_index.entry(entity_type).or_default().push(id);
    self.class_index.entry(entity_class).or_default().push(id);
    self.source_index.entry(connector_id).or_default().push(id);
    // adjacency is updated by upsert_relationship
}
}

Tombstone (Soft Delete)

When the sync protocol determines that an entity was removed from a source:

#![allow(unused)]
fn main() {
pub fn delete_entity(&mut self, id: EntityId) {
    if let Some(entity) = self.entities.get_mut(&id) {
        entity._deleted = true;  // soft delete — remains for snapshot visibility
    }
}
}

Soft-deleted entities are invisible to queries (INV-S08) but remain in memory until the next compaction cycle removes them.

Adjacency Index

The adjacency index enables O(1) neighbor lookups:

#![allow(unused)]
fn main() {
pub fn upsert_relationship(&mut self, rel: Relationship) {
    let rel_id = rel.id;
    let from_id = rel.from_id;
    let to_id = rel.to_id;

    self.relationships.insert(rel_id, rel);

    // Both endpoints track this relationship
    self.adjacency.entry(from_id).or_default().0.push(rel_id); // outgoing
    self.adjacency.entry(to_id).or_default().1.push(rel_id);   // incoming
}
}

This is the index that makes graph traversal fast. Without it, every hop would require a full scan of all relationships.

Flush to Segment

When memtable.approx_bytes() > config.memtable_flush_size (default: 64MB), StorageEngine::maybe_flush() runs:

#![allow(unused)]
fn main() {
fn maybe_flush(&mut self) -> Result<(), StoreError> {
    if self.memtable.approx_bytes() <= self.config.memtable_flush_size {
        return Ok(());
    }

    // Write current MemTable contents to a new .pxs segment
    let segment_path = self.next_segment_path();
    Segment::write(&segment_path, &self.memtable)?;

    // Drain the MemTable: clears entity/rel data, preserves adjacency index
    let new_segment = SegmentRef::open(segment_path)?;
    let drained = self.memtable.drain_to_flush();
    self.segments.push(new_segment);

    // Publish new snapshot pointing to empty MemTable + new segment
    self.publish_snapshot();
    Ok(())
}
}

The drain_to_flush() operation is carefully designed:

  • Entity and relationship data moves to the segment file
  • The adjacency index is preserved (rebuilt from segments during recovery)
  • Secondary indices are cleared (rebuilt from segment scans as needed)

Memory Accounting

#![allow(unused)]
fn main() {
pub fn approx_bytes(&self) -> usize {
    // Rough estimate: sum of entity and relationship sizes
    self.entities.values().map(|e| std::mem::size_of_val(e)).sum::<usize>()
        + self.relationships.values().map(|r| std::mem::size_of_val(r)).sum::<usize>()
}
}

This is an approximation — it counts stack sizes of the structs but not heap-allocated strings. For memory budgeting, assume 2-4× the struct size per entity due to CompactString heap allocations for long strings.

Query Methods

The MemTable exposes index-accelerated query methods used by Snapshot:

#![allow(unused)]
fn main() {
// O(1) lookup
pub fn get_entity(&self, id: EntityId) -> Option<&Entity>;
pub fn get_relationship(&self, id: RelationshipId) -> Option<&Relationship>;

// Index-accelerated scans
pub fn entities_by_type(&self, t: &EntityType) -> Vec<&Entity>;
pub fn entities_by_class(&self, c: &EntityClass) -> Vec<&Entity>;
pub fn entities_by_source(&self, connector_id: &str) -> Vec<&Entity>;
pub fn all_entities(&self) -> impl Iterator<Item = &Entity>;

// Adjacency (O(1) for the lookup, O(degree) for iteration)
pub fn outgoing_relationships(&self, id: EntityId) -> Vec<&Relationship>;
pub fn incoming_relationships(&self, id: EntityId) -> Vec<&Relationship>;
}

Segments

Segment files are immutable, on-disk snapshots of MemTable contents. When the MemTable grows beyond memtable_flush_size, it is flushed to a new segment file and the in-memory data is freed.

On-Disk Format

Segment files live in {data_dir}/segments/ and are named seg-{index:08}.pxs (e.g., seg-00000000.pxs).

┌──────────────┬────────────────────────────────────────┐
│  Header (5)  │         Payload (postcard)              │
├──────────────┼────────────────────────────────────────┤
│ magic (4)    │  SegmentData {                          │
│ version (1)  │    entities: Vec<Entity>,               │
│              │    relationships: Vec<Relationship>,    │
│              │  }                                      │
└──────────────┴────────────────────────────────────────┘
FieldSizeValue
magic4 bytes0x50585347 — ASCII "PXSG" (Parallax SeGment)
version1 byte1 (current format version)
payloadN bytespostcard::to_allocvec(SegmentData { entities, relationships })

INV-S07: Segment files are immutable after creation. Compaction produces new segment files — it never modifies existing ones.

Writing a Segment

#![allow(unused)]
fn main() {
pub fn write(path: &Path, memtable: &MemTable) -> Result<(), StoreError> {
    let entities: Vec<Entity> = memtable.all_entities()
        .filter(|e| !e._deleted)
        .cloned()
        .collect();
    let relationships: Vec<Relationship> = memtable.all_relationships()
        .filter(|r| !r._deleted)
        .cloned()
        .collect();

    let data = SegmentData { entities, relationships };
    let payload = postcard::to_allocvec(&data)
        .map_err(|e| StoreError::Corruption(e.to_string()))?;

    let mut file = File::create(path)?;
    file.write_all(&SEGMENT_MAGIC)?;
    file.write_all(&[SEGMENT_VERSION])?;
    file.write_all(&payload)?;
    file.sync_all()?;
    Ok(())
}
}

Soft-deleted entities and relationships are excluded from the segment. This is the mechanism by which deletes are physically reclaimed.

Reading a Segment

#![allow(unused)]
fn main() {
pub fn open(path: &Path) -> Result<SegmentRef, StoreError> {
    let bytes = std::fs::read(path)?;

    // Verify magic
    if bytes.len() < 5 || &bytes[..4] != SEGMENT_MAGIC {
        return Err(StoreError::Corruption("bad segment magic".into()));
    }

    // Check version
    let version = bytes[4];
    if version != SEGMENT_VERSION {
        return Err(StoreError::Corruption(
            format!("unknown segment version {version}")
        ));
    }

    // Deserialize
    let data: SegmentData = postcard::from_bytes(&bytes[5..])?;
    Ok(SegmentRef { path: path.to_owned(), data })
}
}

Snapshot Integration

A Snapshot holds an Arc<Vec<SegmentRef>> in addition to a MemTable reference. When reading, the snapshot:

  1. Checks the MemTable first (most recent data)
  2. Scans segments in reverse order (newest to oldest)
  3. Returns the first match found
#![allow(unused)]
fn main() {
impl Snapshot {
    pub fn get_entity(&self, id: EntityId) -> Option<&Entity> {
        // MemTable first
        if let Some(e) = self.memtable.get_entity(id) {
            return if e._deleted { None } else { Some(e) };
        }
        // Then segments (newest first)
        for segment in self.segments.iter().rev() {
            if let Some(e) = segment.get_entity(id) {
                return if e._deleted { None } else { Some(e) };
            }
        }
        None
    }
}
}

Compaction (v0.2)

In v0.1, segments are never compacted. They accumulate until you delete the data directory. Background segment compaction is planned for v0.2:

  1. Merge multiple small segments into fewer large segments
  2. Remove soft-deleted entities and stale versions
  3. Build a sparse index on segment boundaries for faster lookups

The current linear scan approach is correct for graphs up to ~1M entities. At larger scale, the segment index becomes important for performance.

MVCC Snapshots

Snapshots are the read interface to the storage engine. They provide a frozen, immutable view of the graph at a point in time. Readers acquire a snapshot once and hold it for the duration of a read operation — they never block writers, and writers never invalidate their data.

What a Snapshot Is

#![allow(unused)]
fn main() {
pub struct Snapshot {
    /// Immutable reference to MemTable at the time of snapshot creation.
    /// The MemTable is never mutated after the snapshot is published.
    memtable: Arc<MemTable>,

    /// Ordered list of segment references (oldest to newest).
    segments: Arc<Vec<SegmentRef>>,
}
}

The Snapshot is wrapped in Arc<Snapshot> so multiple readers can share the same snapshot cheaply. Acquiring a snapshot is Arc::clone — one atomic increment on the reference count.

Snapshot Manager

The SnapshotManager maintains the current published snapshot using arc-swap for lock-free atomic updates:

#![allow(unused)]
fn main() {
pub struct SnapshotManager {
    current: ArcSwap<Snapshot>,
}

impl SnapshotManager {
    /// Called by readers — O(1), no lock.
    pub fn load(&self) -> Arc<Snapshot> {
        self.current.load_full()
    }

    /// Called by the writer after every commit — O(1) atomic swap.
    pub fn publish(&self, snapshot: Snapshot) {
        self.current.store(Arc::new(snapshot));
    }
}
}

arc-swap guarantees that:

  • A reader loading the snapshot always gets a consistent, complete view.
  • There is no moment when the snapshot pointer is null or partially updated.
  • No reader needs to hold a lock to read the snapshot.

Snapshot Lifetime

#![allow(unused)]
fn main() {
// Acquiring: O(1), no lock, no allocation
let snap = engine.snapshot();   // = manager.load()

// Using: reads go through the snapshot — guaranteed consistent view
let entity = snap.get_entity(id);
let hosts = snap.entities_by_class(&EntityClass::new_unchecked("Host"));

// Releasing: O(1) when Arc reference count drops to zero
drop(snap);
}

When a snapshot is dropped, its reference count decrements. If this was the last reference, the Arc frees the MemTable and segment list it held. Old MemTable data can then be freed.

Snapshot Query Methods

#![allow(unused)]
fn main() {
impl Snapshot {
    // Point lookups (O(1) MemTable, O(n) segment scan)
    pub fn get_entity(&self, id: EntityId) -> Option<&Entity>;
    pub fn get_relationship(&self, id: RelationshipId) -> Option<&Relationship>;

    // Index-accelerated scans
    pub fn entities_by_type(&self, t: &EntityType) -> Vec<&Entity>;
    pub fn entities_by_class(&self, c: &EntityClass) -> Vec<&Entity>;
    pub fn entities_by_source(&self, connector_id: &str) -> Vec<&Entity>;
    pub fn relationships_by_source(&self, connector_id: &str) -> Vec<&Relationship>;
    pub fn all_entities(&self) -> impl Iterator<Item = &Entity> + '_;

    // Adjacency
    pub fn outgoing(&self, id: EntityId) -> Vec<&Relationship>;
    pub fn incoming(&self, id: EntityId) -> Vec<&Relationship>;

    // Stats
    pub fn entity_count(&self) -> usize;
    pub fn relationship_count(&self) -> usize;
}
}

Consistency Guarantees

Read-your-writes: Within the same StorageEngine instance, a read snapshot acquired after a write() call will always see the written data.

Snapshot isolation: A snapshot acquired at time T will never see writes committed after T, even if those writes happen on the same thread.

No dirty reads: A snapshot only contains data from committed WriteBatches — data written to the WAL but not yet applied to the MemTable is not visible.

Using Snapshots in async Code

Snapshot contains Arc references, making it Send + Sync. However, GraphReader<'snap> borrows the snapshot and cannot cross await points. The recommended pattern:

#![allow(unused)]
fn main() {
async fn my_handler(engine: Arc<Mutex<StorageEngine>>) -> Vec<Entity> {
    // Block: acquire lock, snapshot, compute, release
    let results = {
        let engine = engine.lock().unwrap();
        let snap = engine.snapshot();
        // All computation here — no await
        snap.entities_by_class(&EntityClass::new_unchecked("Host"))
            .into_iter()
            .filter(|e| !e._deleted)
            .cloned()
            .collect::<Vec<_>>()
        // snap dropped, lock released
    };

    // Now you can await freely with owned Vec<Entity>
    process_results(results).await
}
}

Alternatively, clone the Arc<Snapshot> and pass it to a spawn_blocking task for CPU-intensive graph operations that would otherwise block the async runtime.

StorageEngine API

StorageEngine is the top-level coordinator that ties together the WAL, MemTable, Segments, and SnapshotManager.

Opening an Engine

#![allow(unused)]
fn main() {
use parallax_store::{StorageEngine, StoreConfig};

// Open an existing engine or create a new one
let engine = StorageEngine::open(StoreConfig::new("/var/lib/parallax"))?;
}

StoreConfig accepts:

#![allow(unused)]
fn main() {
pub struct StoreConfig {
    /// Root directory for all engine data.
    pub data_dir: PathBuf,
    /// Flush MemTable to segment when size exceeds this (default: 64MB).
    pub memtable_flush_size: usize,
    /// Maximum WAL segment size before rotation (default: 64MB).
    pub wal_segment_max_size: u64,
}

impl StoreConfig {
    /// Create a config with default settings.
    pub fn new(data_dir: impl Into<PathBuf>) -> Self { ... }
}
}

Writing Data

All writes go through WriteBatch:

#![allow(unused)]
fn main() {
use parallax_store::WriteBatch;

let mut batch = WriteBatch::new();

// Upsert an entity (insert or update)
batch.upsert_entity(entity);

// Upsert a relationship
batch.upsert_relationship(relationship);

// Soft-delete an entity
batch.delete_entity(entity_id);

// Soft-delete a relationship
batch.delete_relationship(rel_id);

// Commit atomically (WAL + MemTable + snapshot publish)
engine.write(batch)?;
}

WriteBatch is opaque — you cannot inspect it after creation. The write is atomic: either all operations succeed or none are applied.

WriteOp Internals

Under the hood, WriteBatch is a Vec<WriteOp>:

#![allow(unused)]
fn main() {
pub enum WriteOp {
    UpsertEntity(Entity),
    UpsertRelationship(Relationship),
    DeleteEntity(EntityId),
    DeleteRelationship(RelationshipId),
}

pub struct WriteBatch {
    pub(crate) ops: Vec<WriteOp>,
}

impl WriteBatch {
    pub fn is_empty(&self) -> bool { self.ops.is_empty() }
    pub fn len(&self) -> usize { self.ops.len() }
}
}

Reading Data

All reads go through a Snapshot:

#![allow(unused)]
fn main() {
// Acquire a snapshot (O(1), no lock)
let snap = engine.snapshot();

// Point lookups
let entity = snap.get_entity(entity_id);
let rel = snap.get_relationship(rel_id);

// Type/class scans (uses MemTable indices)
let hosts = snap.entities_by_type(&EntityType::new_unchecked("host"));
let services = snap.entities_by_class(&EntityClass::new_unchecked("Service"));

// Source-scoped scans (for sync diff)
let from_aws = snap.entities_by_source("connector-aws");

// Counts
let total = snap.entity_count();
}

Metrics

#![allow(unused)]
fn main() {
// Get a snapshot of current engine metrics
let metrics = engine.metrics().snapshot();
println!("Total entities: {}", metrics.entity_count);
println!("Total relationships: {}", metrics.relationship_count);
println!("Writes: {}", metrics.writes_total);
println!("Reads: {}", metrics.reads_total);
}

Metrics are maintained as atomic counters and can be read from any thread without acquiring the engine lock.

Engine in a Shared Context

In server mode, the engine is shared via Arc<Mutex<StorageEngine>>:

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex};
use parallax_store::StorageEngine;

let engine = StorageEngine::open(config)?;
let shared = Arc::new(Mutex::new(engine));

// In a handler:
let snap = {
    let engine = shared.lock().unwrap();
    engine.snapshot()
    // Lock released here — snapshot is Arc-owned, not borrowed from engine
};
// Use snap freely without holding the lock
}

Error Types

#![allow(unused)]
fn main() {
pub enum StoreError {
    /// I/O error from the OS (file not found, permission denied, etc.)
    Io(io::Error),
    /// Data corruption detected (bad magic, CRC mismatch, etc.)
    Corruption(String),
    /// Serialization error (postcard format error)
    Serialization(String),
}
}

StoreError implements std::error::Error and Display. Use ? to propagate it up the call stack.

Thread Safety

StorageEngine is Send but not Sync. Wrap it in Arc<Mutex<StorageEngine>> for shared access. The lock should be held as briefly as possible:

#![allow(unused)]
fn main() {
// Good: lock, snapshot, release, use
let snap = engine.lock().unwrap().snapshot();
let entity = snap.get_entity(id);

// Bad: hold lock for the duration of graph computation
let engine = engine.lock().unwrap();
let entity = engine.snapshot().get_entity(id); // lock held too long
}

Graph Engine Overview

parallax-graph is the reasoning layer between raw storage and the query language. It takes an MVCC snapshot and exposes graph-aware operations: traversal, pattern matching, shortest path, blast radius, and coverage gap.

The GraphReader

All graph operations start with a GraphReader<'snap>, which borrows an immutable snapshot and provides zero-copy access to graph data:

#![allow(unused)]
fn main() {
use parallax_graph::GraphReader;

let snap = engine.snapshot();
let graph = GraphReader::new(&snap);

// Entity finder
let hosts: Vec<&Entity> = graph
    .find("host")
    .with_property("state", Value::from("running"))
    .collect();

// Traversal
let neighbors = graph
    .traverse(start_id)
    .direction(Direction::Outgoing)
    .max_depth(3)
    .collect();

// Shortest path
let path = graph
    .shortest_path(from_id, to_id)
    .find();

// Blast radius
let blast = graph
    .blast_radius(target_id)
    .add_attack_edge("RUNS", Direction::Outgoing)
    .analyze();

// Coverage gap
let uncovered = graph
    .coverage_gap("PROTECTS")
    .target_type("host")
    .neighbor_type("edr_agent")
    .find();
}

Lifetime Discipline

GraphReader<'snap> ties every returned reference to the snapshot's lifetime via Rust's borrow checker. You cannot accidentally hold a reference to a entity after the snapshot is dropped.

#![allow(unused)]
fn main() {
let entity: &Entity = {
    let snap = engine.snapshot();
    let graph = GraphReader::new(&snap);
    graph.get_entity(id).unwrap()
    // ERROR: snap dropped here, but entity borrows from it
};
}

This compile-time guarantee eliminates an entire class of use-after-free and stale-read bugs.

Operations Summary

OperationBuilderDescription
Entity finderGraphReader::find()Filter entities by type, class, properties
All entitiesGraphReader::find_all()Return all non-deleted entities
By classGraphReader::find_by_class()Filter by entity class
TraversalGraphReader::traverse()BFS/DFS from a starting entity
Shortest pathGraphReader::shortest_path()Minimum-hop path between two entities
Blast radiusGraphReader::blast_radius()Attack impact analysis from a target
Coverage gapGraphReader::coverage_gap()Find entities with no qualifying neighbor
Direct lookupGraphReader::get_entity()O(1) entity lookup by ID

Performance

The graph engine is designed for interactive query latency:

OperationTarget p99
Entity lookup by ID (MemTable)≤1μs
Single-hop traversal (degree ≤100)≤500μs
Multi-hop traversal (depth 3, degree 5)≤5ms
Shortest path (1K-node graph)≤10ms
Blast radius (depth 4)≤10ms

These targets assume data in MemTable. Segment reads add ~100μs per entity lookup due to linear scan; segment indexing is planned for v0.2.

GraphReader

GraphReader<'snap> is the primary entry point for all graph operations. It wraps an MVCC snapshot and provides typed, zero-copy access to graph data.

Creating a GraphReader

#![allow(unused)]
fn main() {
use parallax_graph::GraphReader;
use parallax_store::StorageEngine;

let engine = StorageEngine::open(config)?;
let snap = engine.snapshot();      // O(1), no lock after this
let graph = GraphReader::new(&snap);
// graph can now be used freely; snap must not be dropped while graph is in scope
}

Full API Reference

#![allow(unused)]
fn main() {
impl<'snap> GraphReader<'snap> {
    pub fn new(snapshot: &'snap Snapshot) -> Self;

    // ── Entity finding ────────────────────────────────────────────────
    /// Find entities by type. Uses type index.
    pub fn find(&self, entity_type: &str) -> EntityFinder<'snap>;

    /// Find entities by class. Uses class index.
    pub fn find_by_class(&self, class: &str) -> EntityFinder<'snap>;

    /// Find all entities. Full scan.
    pub fn find_all(&self) -> EntityFinder<'snap>;

    /// O(1) point lookup by EntityId.
    pub fn get_entity(&self, id: EntityId) -> Option<&'snap Entity>;

    /// O(1) point lookup by RelationshipId.
    pub fn get_relationship(&self, id: RelationshipId) -> Option<&'snap Relationship>;

    // ── Traversal ─────────────────────────────────────────────────────
    /// Start a traversal from a specific entity.
    pub fn traverse(&self, start: EntityId) -> TraversalBuilder<'snap>;

    // ── Path finding ──────────────────────────────────────────────────
    /// Find the shortest path between two entities.
    pub fn shortest_path(&self, from: EntityId, to: EntityId) -> ShortestPathBuilder<'snap>;

    // ── Blast radius ──────────────────────────────────────────────────
    /// Compute the blast radius from a target entity.
    pub fn blast_radius(&self, origin: EntityId) -> BlastRadiusBuilder<'snap>;

    // ── Coverage analysis ─────────────────────────────────────────────
    /// Find entities missing a qualifying neighbor via the given verb.
    pub fn coverage_gap(&self, verb: &str) -> CoverageGapBuilder<'snap>;
}
}

Lifetime Annotation

The 'snap lifetime on GraphReader<'snap> means:

  • Every &'snap Entity reference returned by the reader borrows from the snapshot
  • The snapshot must outlive the reader and all references derived from it
  • The Rust borrow checker enforces this at compile time — no runtime cost

This design guarantees:

  • Zero-copy reads: Entity data is never cloned from the snapshot
  • No use-after-free: You cannot hold a reference to freed snapshot data
  • No stale reads: You always read from the snapshot you created the reader with

Collecting vs Cloning

Since GraphReader returns references with snapshot lifetimes, you must either use the data within the snapshot's scope or clone it:

#![allow(unused)]
fn main() {
// Use within scope — zero allocation
let snap = engine.snapshot();
let graph = GraphReader::new(&snap);
let count = graph.find("host").count(); // no allocation
let first_host = graph.find("host").collect().first().map(|e| e.display_name.as_str());

// Clone if you need owned data past the snapshot
let hosts: Vec<Entity> = graph
    .find("host")
    .collect()
    .into_iter()
    .cloned()  // Clone here when crossing async boundaries
    .collect();
drop(snap); // now safe to drop
}

In Server Handlers

#![allow(unused)]
fn main() {
async fn list_hosts(State(state): State<AppState>) -> Json<Value> {
    let hosts = {
        let engine = state.engine.lock().unwrap();
        let snap = engine.snapshot();
        let graph = GraphReader::new(&snap);
        // Collect into owned Vec before dropping snap + lock
        graph.find("host")
            .collect()
            .into_iter()
            .cloned()
            .collect::<Vec<Entity>>()
    }; // engine lock + snap released here

    Json(json!({ "entities": hosts.iter().map(entity_to_json).collect::<Vec<_>>() }))
}
}

Entity Finder

The entity finder is a fluent builder for filtering entities from a snapshot. It uses secondary indices (type, class) when available and falls back to a full scan for property-level filters.

Basic Usage

#![allow(unused)]
fn main() {
let graph = GraphReader::new(&snap);

// Find all entities of a type
let hosts: Vec<&Entity> = graph.find("host").collect();

// Find all entities of a class
let all_hosts: Vec<&Entity> = graph
    .find_by_class("Host")
    .collect();

// Find with a property filter
let running: Vec<&Entity> = graph
    .find("host")
    .with_property("state", Value::from("running"))
    .collect();
}

EntityFinder Methods

#![allow(unused)]
fn main() {
impl<'snap> EntityFinder<'snap> {
    /// Filter to entities of the specified class.
    /// Uses class index — no full scan.
    pub fn class(self, c: &str) -> Self;

    /// Filter to entities where the named property equals the value.
    /// Uses full scan (no property index in v0.1).
    pub fn with_property(self, key: &str, value: Value) -> Self;

    /// Limit the number of results.
    pub fn limit(self, n: usize) -> Self;

    /// Collect all matching entities.
    pub fn collect(self) -> Vec<&'snap Entity>;

    /// Count matching entities without materializing them.
    pub fn count(self) -> usize;
}
}

GraphReader Finder Variants

#![allow(unused)]
fn main() {
impl<'snap> GraphReader<'snap> {
    /// Find entities by type ("host", "aws_ec2_instance").
    /// Uses the type index.
    pub fn find(&self, entity_type: &str) -> EntityFinder<'snap>;

    /// Find entities by class ("Host", "User", "DataStore").
    /// Uses the class index.
    pub fn find_by_class(&self, class: &str) -> EntityFinder<'snap>;

    /// Find all entities (no type filter). Full scan.
    pub fn find_all(&self) -> EntityFinder<'snap>;

    /// Direct lookup by EntityId. O(1).
    pub fn get_entity(&self, id: EntityId) -> Option<&'snap Entity>;
}
}

Examples

Find running hosts

#![allow(unused)]
fn main() {
let running_hosts = graph
    .find("host")
    .with_property("state", Value::from("running"))
    .collect();
}

Count all services

#![allow(unused)]
fn main() {
let count = graph.find_by_class("Service").count();
}

Find with limit

#![allow(unused)]
fn main() {
// First 100 hosts for pagination
let page = graph.find("host").limit(100).collect();
}

Index Strategy

The planner chooses the access strategy based on the query:

FilterAccess MethodCost
Type onlyType index lookupO(n_type)
Class onlyClass index lookupO(n_class)
Type + propertyType index + filter scanO(n_type)
Class + propertyClass index + filter scanO(n_class)
Property onlyFull scanO(n_total)

Property-level secondary indexes are planned for v0.2. For v0.1, property filters always require a linear scan over the type/class result set.

Deleted Entity Handling

The finder automatically excludes soft-deleted entities (_deleted = true). You never see deleted entities in query results (INV-S08).

Graph Traversal

Traversal walks the graph from a starting entity, following edges in a specified direction and applying filters at each step.

Basic Traversal

#![allow(unused)]
fn main() {
let graph = GraphReader::new(&snap);

// Traverse all outgoing edges from start_id, up to depth 3
let results: Vec<TraversalResult> = graph
    .traverse(start_id)
    .direction(Direction::Outgoing)
    .max_depth(3)
    .collect();
}

TraversalBuilder Options

#![allow(unused)]
fn main() {
pub struct TraversalBuilder<'snap> {
    ...
}

impl<'snap> TraversalBuilder<'snap> {
    /// BFS (default) or DFS traversal order.
    pub fn bfs(self) -> Self;
    pub fn dfs(self) -> Self;

    /// Direction of edges to follow.
    pub fn direction(self, dir: Direction) -> Self;
    // Direction::Outgoing, Direction::Incoming, Direction::Both

    /// Only follow edges with these relationship classes (verbs).
    pub fn edge_classes(self, classes: &[&str]) -> Self;

    /// Only visit entities with this type.
    pub fn filter_node_type(self, t: &str) -> Self;

    /// Only visit entities with this class.
    pub fn filter_node_class(self, c: &str) -> Self;

    /// Only visit entities matching this property filter.
    pub fn filter_node_property(self, key: &str, value: Value) -> Self;

    /// Stop after visiting this many hops from the start.
    pub fn max_depth(self, depth: usize) -> Self;

    /// Collect all results
    pub fn collect(self) -> Vec<TraversalResult<'snap>>;
}
}

TraversalResult

Each visited entity produces a TraversalResult:

#![allow(unused)]
fn main() {
pub struct TraversalResult<'snap> {
    /// The entity reached at this hop.
    pub entity: &'snap Entity,

    /// The relationship that connected the previous hop to this entity.
    pub via: Option<&'snap Relationship>,

    /// How many hops from the start entity.
    pub depth: usize,

    /// The full path from start to this entity.
    pub path: GraphPath,
}
}

GraphPath

#![allow(unused)]
fn main() {
pub struct GraphPath {
    /// Ordered sequence of (entity_id, relationship_id) pairs.
    pub segments: Vec<PathSegment>,
}

pub struct PathSegment {
    pub entity_id: EntityId,
    pub via_relationship: Option<RelationshipId>,
}
}

Paths are returned for every traversal result. For large traversals, omit path tracking by using a custom iterator instead of collect().

Cycle Handling

BFS traversal maintains a visited set and never visits an entity twice. This prevents infinite loops in cyclic graphs (INV-G03):

#![allow(unused)]
fn main() {
// This terminates even in a graph with cycles:
let results = graph
    .traverse(a_id)
    .direction(Direction::Outgoing)
    .max_depth(100)
    .collect();
}

DFS also maintains the visited set, so it is also cycle-safe (but produces different ordering than BFS).

Direction

#![allow(unused)]
fn main() {
pub enum Direction {
    /// Follow outgoing edges: entity → neighbor
    Outgoing,
    /// Follow incoming edges: entity ← neighbor
    Incoming,
    /// Follow both directions
    Both,
}
}

Examples

Find all services reachable from a host

#![allow(unused)]
fn main() {
let services = graph
    .traverse(host_id)
    .direction(Direction::Outgoing)
    .filter_node_class("Service")
    .max_depth(3)
    .collect();
}

Find all entities that depend on a database

#![allow(unused)]
fn main() {
let dependents = graph
    .traverse(db_id)
    .direction(Direction::Incoming)
    .edge_classes(&["USES", "CONNECTS", "READS"])
    .max_depth(5)
    .collect();
}

BFS vs DFS

Use BFS (default) when you want results ordered by proximity — closer entities first. This is best for blast radius and impact analysis.

Use DFS when you want to follow a chain as deep as possible before backtracking. This is better for path exploration.

#![allow(unused)]
fn main() {
// BFS: finds nearest entities first
let bfs = graph.traverse(start).bfs().max_depth(4).collect();

// DFS: explores deep paths first
let dfs = graph.traverse(start).dfs().max_depth(4).collect();
}

Shortest Path

Shortest path finds the minimum-hop chain of relationships between two specific entities. This answers questions like:

  • "What is the access path between this user and this S3 bucket?"
  • "Is there any connection between these two systems?"
  • "What is the shortest privilege escalation route from this role to that secret?"

Basic Usage

#![allow(unused)]
fn main() {
let graph = GraphReader::new(&snap);

let path = graph
    .shortest_path(user_id, s3_bucket_id)
    .find();

match path {
    Some(path) => {
        println!("Found path with {} hops", path.segments.len());
        for segment in &path.segments {
            println!("  Entity: {:?}", segment.entity_id);
        }
    }
    None => println!("No path exists"),
}
}

ShortestPathBuilder

#![allow(unused)]
fn main() {
impl<'snap> ShortestPathBuilder<'snap> {
    /// Limit the search to this many hops (default: unlimited).
    pub fn max_depth(self, depth: usize) -> Self;

    /// Only follow edges with these verbs.
    pub fn edge_classes(self, classes: &[&str]) -> Self;

    /// Direction of edges to follow (default: Both).
    pub fn direction(self, dir: Direction) -> Self;

    /// Execute the search. Returns None if no path exists.
    pub fn find(self) -> Option<GraphPath>;
}
}

Algorithm

Parallax implements bidirectional BFS — searching from both the source and target simultaneously. This dramatically reduces the search space for sparse graphs:

Unidirectional BFS: O(b^d) nodes visited (b = branching factor, d = depth)
Bidirectional BFS:  O(b^(d/2)) nodes visited

For a graph with branching factor 10 and path length 6:

  • Unidirectional: 10^6 = 1,000,000 nodes
  • Bidirectional: 10^3 = 1,000 nodes (1000× fewer)

The two frontiers meet in the middle to form the path.

INV-Q05: Shortest path always returns the minimum-hop path or None. It never returns a longer path than the shortest one.

Handling Cycles

The BFS searches maintain visited sets, so cyclic graphs are handled correctly. The search terminates when:

  • The two frontiers meet (path found)
  • Both frontiers are exhausted (no path)
  • max_depth is reached (INV-Q06)

Examples

Privilege escalation path

#![allow(unused)]
fn main() {
let path = graph
    .shortest_path(attacker_id, secret_id)
    .edge_classes(&["ASSIGNED", "ALLOWS", "HAS", "USES"])
    .direction(Direction::Both)
    .max_depth(6)
    .find();
}

Reachability check

#![allow(unused)]
fn main() {
// Is there any connection between these two networks?
let connected = graph
    .shortest_path(network_a_id, network_b_id)
    .edge_classes(&["CONNECTS", "CONTAINS"])
    .find()
    .is_some();
}

PQL Equivalent

-- PQL
FIND SHORTEST PATH FROM user WITH email = 'alice@corp.com'
  TO aws_s3_bucket WITH _key = 'arn:aws:s3:::secrets'

-- Rust equivalent (after resolving entity IDs)
graph.shortest_path(alice_id, bucket_id).find()

Blast Radius

Blast radius analysis answers: "If this entity is compromised, what else is at risk?"

Given a starting entity (e.g., a compromised host or credential), blast radius follows attacker-relevant relationships to identify all entities within attack reach.

Basic Usage

#![allow(unused)]
fn main() {
let graph = GraphReader::new(&snap);

let result = graph
    .blast_radius(compromised_host_id)
    .add_attack_edge("RUNS", Direction::Outgoing)   // host → services
    .add_attack_edge("CONNECTS", Direction::Outgoing) // host → other hosts
    .max_depth(4)
    .analyze();

println!("Impacted entities: {}", result.impacted.len());
println!("Critical paths: {}", result.critical_paths.len());
println!("High-value targets: {}", result.high_value_targets.len());
}

BlastRadiusBuilder

#![allow(unused)]
fn main() {
impl<'snap> BlastRadiusBuilder<'snap> {
    /// Add an attack path edge type with direction.
    /// Call multiple times for multiple attack vectors.
    pub fn add_attack_edge(self, verb: &str, direction: Direction) -> Self;

    /// Maximum hops from the origin entity (default: 4).
    pub fn max_depth(self, depth: usize) -> Self;

    /// Run the analysis and return results.
    pub fn analyze(self) -> BlastRadiusResult<'snap>;
}
}

BlastRadiusResult

#![allow(unused)]
fn main() {
pub struct BlastRadiusResult<'snap> {
    /// The origin entity — the starting point of the attack.
    pub origin: &'snap Entity,

    /// All entities reachable via the specified attack edges.
    pub impacted: Vec<&'snap Entity>,

    /// Paths to entities classified as high-value targets.
    pub critical_paths: Vec<GraphPath>,

    /// High-value targets in the blast radius.
    /// These are entities whose class is in the high-value set.
    pub high_value_targets: Vec<&'snap Entity>,
}
}

High-Value Target Classes

The following entity classes are automatically identified as high-value targets when found in the blast radius:

DataStore   Secret      Key         Database
Credential  Certificate Identity    Account

These classes represent data and access assets that attackers specifically target.

Default Attack Edges

If no attack edges are specified, the blast radius builder uses a default set of attacker-relevant relationship verbs:

RUNS, CONNECTS, TRUSTS, CONTAINS, HAS, USES, EXPLOITS

These cover the most common lateral movement patterns.

Example: Compromised Credential

#![allow(unused)]
fn main() {
// If an attacker has this credential, what can they access?
let result = graph
    .blast_radius(credential_id)
    .add_attack_edge("ALLOWS", Direction::Outgoing)  // credential → policies
    .add_attack_edge("ASSIGNED", Direction::Incoming) // who was assigned this
    .add_attack_edge("HAS", Direction::Incoming)      // who owns this
    .max_depth(5)
    .analyze();

// Check if any datastores are in the blast radius
let exposed_datastores: Vec<_> = result.high_value_targets.iter()
    .filter(|e| e._class.as_str() == "DataStore")
    .collect();

if !exposed_datastores.is_empty() {
    println!("CRITICAL: {} datastores exposed", exposed_datastores.len());
}
}

Example: PQL Equivalent

The blast radius builder corresponds to PQL's FIND BLAST RADIUS FROM syntax:

-- PQL
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4

-- Rust equivalent
graph.blast_radius(host_id).max_depth(4).analyze()

Performance

Blast radius uses BFS internally and visits each entity at most once. For a graph with 100K entities and average degree 10, expect:

  • Depth 2: ~100 entities visited, <1ms
  • Depth 4: ~10,000 entities visited, ~10ms
  • Depth 6: ~100,000 entities visited, ~100ms

Set max_depth conservatively — attacker lateral movement rarely exceeds 4-6 hops in practice.

Coverage Gap

Coverage gap analysis finds entities that are missing a qualifying neighbor. This answers questions like:

  • "Which hosts have no EDR agent protecting them?"
  • "Which services have no scanner scanning them?"
  • "Which databases have no backup agent connected to them?"

Basic Usage

#![allow(unused)]
fn main() {
let graph = GraphReader::new(&snap);

// Find all hosts with no EDR agent protecting them
let unprotected = graph
    .coverage_gap("PROTECTS")
    .target_type("host")
    .neighbor_type("edr_agent")
    .find();

println!("{} hosts have no EDR coverage", unprotected.len());
}

CoverageGapBuilder

#![allow(unused)]
fn main() {
impl<'snap> CoverageGapBuilder<'snap> {
    /// The relationship verb that represents coverage.
    /// e.g., "PROTECTS", "SCANS", "MANAGES"
    /// (This is the first argument to coverage_gap())

    /// The type of entity to check for coverage gaps.
    pub fn target_type(self, t: &str) -> Self;

    /// The class of entity that provides coverage.
    pub fn target_class(self, c: &str) -> Self;

    /// The type of entity that provides coverage (the "covering" entity).
    pub fn neighbor_type(self, t: &str) -> Self;

    /// The class of the covering entity.
    pub fn neighbor_class(self, c: &str) -> Self;

    /// Direction of the coverage edge (default: Incoming — neighbor → target).
    pub fn direction(self, dir: Direction) -> Self;

    /// Execute and return all entities with no qualifying neighbor.
    pub fn find(self) -> Vec<&'snap Entity>;
}
}

INV-G06

INV-G06: Coverage gap only returns entities of target_type that have no qualifying neighbor via the specified verb. Entities that have at least one qualifying neighbor are excluded.

Examples

Scanner coverage

#![allow(unused)]
fn main() {
// Which hosts have never been scanned?
let unscanned = graph
    .coverage_gap("SCANS")
    .target_type("host")
    .neighbor_class("Scanner")
    .direction(Direction::Incoming)  // scanner → host
    .find();
}

EDR protection

#![allow(unused)]
fn main() {
// Which containers have no security agent?
let unprotected = graph
    .coverage_gap("PROTECTS")
    .target_class("Container")
    .neighbor_class("Agent")
    .find();
}

Backup coverage

#![allow(unused)]
fn main() {
// Which databases have no backup relationship?
let unbackedup = graph
    .coverage_gap("HAS")
    .target_class("Database")
    .neighbor_type("backup_job")
    .find();
}

PQL Equivalent

Coverage gap corresponds to PQL's negated traversal:

-- PQL: find hosts with no EDR protection
FIND host THAT !PROTECTS edr_agent

-- Rust equivalent
graph.coverage_gap("PROTECTS").target_type("host").neighbor_type("edr_agent").find()

Performance

Coverage gap requires:

  1. Fetch all entities of target_type (index scan)
  2. For each entity, check if any verb edge exists to a qualifying neighbor (adjacency lookup)

The adjacency index makes step 2 O(degree) per entity, not O(n_relationships). For 10,000 hosts with average degree 5, expect:

  • 10,000 adjacency lookups × O(5) = 50,000 operations
  • Typically completes in <50ms

PQL — Parallax Query Language

PQL is the read-only query language for Parallax. It is designed for security practitioners who are not graph database experts: readable in plain English, learnable in 10 minutes, and predictable in performance.

Design Goals

GoalHow
Readable by non-engineersEnglish-like: FIND, THAT, WITH, ALLOWS
Learnable in 10 minutesCore syntax is 5 clauses; no joins, no subqueries in v0.1
Predictable performanceEvery query maps to a known graph operation
Machine-parseableClean grammar → easy for AI to generate PQL from natural language

Non-Goals

PQL is read-only. All writes go through the ingest API. There is no INSERT, UPDATE, DELETE, or MERGE in PQL.

PQL is not a general-purpose graph query language. No arbitrary pattern matching with anonymous nodes, no recursive CTEs, no graph algorithms in the language itself.

Core Syntax

Every PQL query is one of three forms:

-- 1. Entity query (most common)
FIND <entity_filter>
  [WITH <property_filters>]
  [THAT <traversal_chain>]
  [RETURN <projection>]
  [LIMIT <n>]

-- 2. Shortest path query
FIND SHORTEST PATH
  FROM <entity_filter> [WITH <property_filters>]
  TO   <entity_filter> [WITH <property_filters>]
  [DEPTH <n>]

-- 3. Blast radius query
FIND BLAST RADIUS
  FROM <entity_filter> [WITH <property_filters>]
  [DEPTH <n>]

The Five Clauses

FIND

Specifies which entities to start with. The argument is either:

  • An entity type: specific (e.g., host, aws_ec2_instance)
  • An entity class: broad (e.g., Host, User, DataStore)
  • * for any entity
FIND host
FIND Host
FIND aws_ec2_instance
FIND *

WITH

Filters entities by property values. Multiple conditions are combined with AND.

FIND host WITH state = 'running'
FIND host WITH state = 'running' AND region = 'us-east-1'
FIND user WITH active = true AND email LIKE '@corp.com'

THAT

Traverses relationships. Can be chained for multi-hop queries. Supports negation with ! to find coverage gaps.

FIND host THAT RUNS service
FIND user THAT ASSIGNED role THAT ALLOWS s3_bucket
FIND host THAT !PROTECTS edr_agent   -- hosts with no EDR

RETURN

Specifies output format. Defaults to full entity objects.

FIND host RETURN COUNT              -- count only
FIND host RETURN display_name, state  -- specific properties

LIMIT

Limits the number of results returned.

FIND host LIMIT 100
FIND host WITH state = 'running' LIMIT 10

Quick Reference

-- All running hosts
FIND host WITH state = 'running'

-- All services on running hosts
FIND host WITH state = 'running' THAT RUNS service

-- Hosts with no EDR
FIND host THAT !PROTECTS edr_agent

-- Count of all hosts
FIND host RETURN COUNT

-- Shortest path from user to secret
FIND SHORTEST PATH FROM user WITH email = 'alice@corp.com'
  TO secret WITH name = 'prod-db-password'

-- Blast radius from compromised host
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4

See Syntax Reference for the complete grammar, and Examples for real-world query patterns.

PQL Syntax Reference

Formal Grammar (EBNF)

query
    = find_query
    | path_query
    | blast_query
    ;

find_query
    = "FIND" entity_filter
      ("WITH" property_expr)?
      ("THAT" traversal_chain)?
      ("GROUP" "BY" identifier)?
      ("RETURN" return_clause)?
      ("LIMIT" integer)?
    ;

path_query
    = "FIND" "SHORTEST" "PATH"
      "FROM" entity_filter ("WITH" property_expr)?
      "TO"   entity_filter ("WITH" property_expr)?
      ("DEPTH" integer)?
    ;

blast_query
    = "FIND" "BLAST" "RADIUS"
      "FROM" entity_filter ("WITH" property_expr)?
      ("DEPTH" integer)?
    ;

entity_filter
    = identifier          (* type: "host", "aws_ec2_instance" *)
    | "*"                 (* any entity *)
    ;

traversal_chain
    = traversal_step ("THAT" traversal_step)*
    ;

traversal_step
    = negation? verb entity_filter ("WITH" property_expr)?
    ;

negation = "!" ;

verb
    = "HAS" | "IS" | "ASSIGNED" | "ALLOWS" | "USES"
    | "CONTAINS" | "MANAGES" | "CONNECTS" | "PROTECTS"
    | "EXPLOITS" | "TRUSTS" | "SCANS" | "RUNS" | "READS" | "WRITES"
    ;

property_expr
    = property_or ("AND" property_or)*
    ;

property_or
    = property_cond ("OR" property_cond)*
    ;

property_cond
    = identifier comparison_op value
    | identifier "IN" "(" value_list ")"
    | identifier "LIKE" string_literal
    | identifier "EXISTS"
    | "NOT" property_cond
    ;

comparison_op = "=" | "!=" | "<" | "<=" | ">" | ">=" ;

return_clause
    = "COUNT"
    | identifier ("," identifier)*
    ;

group_by_clause
    = "GROUP" "BY" identifier
    ;

value
    = string_literal          (* 'single quoted' *)
    | integer                 (* 42 *)
    | float                   (* 3.14 *)
    | "true" | "false"
    | "null"
    ;

value_list
    = value ("," value)*
    ;

identifier
    = [a-zA-Z_][a-zA-Z0-9_]*
    ;

string_literal
    = "'" [^']* "'"           (* single quotes ONLY — no double quotes *)
    ;

Important Syntax Notes

String Literals Use Single Quotes

PQL uses single-quoted string literals. Double quotes are not supported.

-- Correct
FIND host WITH state = 'running'

-- Wrong (parser error)
FIND host WITH state = "running"

Keywords Are Case-Sensitive

PQL keywords (FIND, WITH, THAT, AND, RETURN, LIMIT, etc.) must be uppercase. Identifiers (entity types, property names) are case-sensitive as well.

-- Correct
FIND host WITH state = 'running'

-- Wrong
find Host where State = 'running'

Note: Entity classes are PascalCase (Host, User, DataStore). Entity types are snake_case (host, aws_ec2_instance).

Negation in Traversal

The ! negation before a verb finds entities that do not have a matching neighbor. It cannot be chained.

-- Valid: hosts with no EDR agent
FIND host THAT !PROTECTS edr_agent

-- Invalid: cannot chain after negation
FIND host THAT !PROTECTS edr_agent THAT HAS service   -- syntax error

RETURN Clause

Without RETURN, full entity objects are returned. With RETURN COUNT, only the count is returned (no entity data). With RETURN field1, field2, only the specified property fields are included.

-- Returns full entity objects
FIND host

-- Returns only the count
FIND host RETURN COUNT

-- Returns entities with only display_name and state properties
FIND host RETURN display_name, state

DEPTH in Path Queries

DEPTH limits the maximum number of hops to explore. Without it, the search is unbounded (but bounded by graph diameter in practice).

FIND SHORTEST PATH FROM user TO secret DEPTH 6
FIND BLAST RADIUS FROM host DEPTH 4

Tokenization Rules

TokenRule
KeywordsFIND, WITH, THAT, RETURN, LIMIT, AND, OR, NOT, IN, LIKE, EXISTS, GROUP, BY, SHORTEST, PATH, FROM, TO, BLAST, RADIUS, DEPTH, COUNT
VerbsHAS, IS, ASSIGNED, ALLOWS, USES, CONTAINS, MANAGES, CONNECTS, PROTECTS, EXPLOITS, TRUSTS, SCANS, RUNS, READS, WRITES
Identifiers[a-zA-Z_][a-zA-Z0-9_]*
String'[^']*'
Integer[0-9]+
Float[0-9]+\.[0-9]+
Booleantrue, false
Nullnull
Operators=, !=, <, <=, >, >=
Negation!
Punctuation(, ), ,
Wildcard*
WhitespaceIgnored

Property Filters

The WITH clause filters entities by their properties. All conditions in a WITH clause are combined with AND.

Comparison Operators

-- Equality
FIND host WITH state = 'running'

-- Inequality
FIND host WITH state != 'terminated'

-- Numeric comparisons
FIND host WITH cpu_count > 4
FIND host WITH memory_gb >= 32
FIND host WITH score < 7.5
FIND host WITH version <= 3

String Matching

-- Exact match (case-sensitive)
FIND user WITH email = 'alice@corp.com'

-- Pattern match with LIKE (% = any sequence, _ = any single char)
FIND user WITH email LIKE '%@corp.com'
FIND host WITH hostname LIKE 'web-%'

Boolean

FIND host WITH active = true
FIND aws_s3_bucket WITH public = false
FIND user WITH mfa_enabled = false

Null Checks

-- Field has no value
FIND container WITH cpu_limit = null

-- Field has any value (EXISTS)
FIND host WITH owner EXISTS

-- Field has no value (NOT EXISTS equivalent)
FIND host WITH NOT owner EXISTS

IN List

-- State is one of the listed values
FIND host WITH state IN ('running', 'pending', 'starting')

-- Region is one of the listed values
FIND host WITH region IN ('us-east-1', 'us-west-2', 'eu-west-1')

NOT

-- Negate any condition
FIND host WITH NOT state = 'terminated'
FIND user WITH NOT mfa_enabled = true
FIND host WITH NOT region IN ('us-east-1', 'us-west-2')

Multiple Conditions (AND)

Multiple conditions in a single WITH clause are combined with AND:

FIND host WITH state = 'running' AND region = 'us-east-1'
FIND user WITH active = true AND mfa_enabled = false
FIND host WITH env = 'production' AND cpu_count >= 8 AND state = 'running'

OR Conditions

Use OR to match any of several values for the same property:

-- Hosts running linux or windows
FIND host WITH os = 'linux' OR os = 'windows'

-- Three alternatives
FIND host WITH region = 'us-east-1' OR region = 'eu-west-1' OR region = 'ap-southeast-1'

Combine OR with AND using the natural precedence — OR binds tighter than AND:

-- (os = linux OR os = windows) AND env = prod
FIND host WITH os = 'linux' OR os = 'windows' AND env = 'prod'

GROUP BY

Aggregate query results by a property field:

-- Count hosts by OS
FIND host GROUP BY os

-- Count hosts by region, filtered to running state
FIND host WITH state = 'running' GROUP BY region

GROUP BY returns QueryResult::Grouped — a list of (value, count) pairs, one per distinct value of the field. Entities missing the field land in their own group (Value::Null).

Traversal Filters

WITH can also appear after a traversal verb to filter the target entity:

-- Running hosts that run a specific service
FIND host WITH state = 'running' THAT RUNS service WITH name = 'nginx'

-- Users assigned to admin roles (filter both ends)
FIND user WITH active = true THAT ASSIGNED role WITH admin = true

Value Type Reference

Value TypeExampleNotes
String'running'Single quotes. No double quotes.
Integer42, 100No quotes.
Float3.14, 0.5Decimal point required.
Booleantrue, falseLowercase.
NullnullLowercase.

Case Sensitivity

  • Property names are case-sensitive: stateState
  • String values are case-sensitive: 'Running''running'
  • Keywords (WITH, AND, NOT, IN, LIKE, EXISTS) are uppercase

Special Property Names

These properties are always present on every entity:

PropertyDescription
display_nameHuman-readable name
_typeEntity type string (e.g., "host")
_classEntity class string (e.g., "Host")
_keySource-system key
_deletedSoft-delete flag (always false in query results)

Traversal Queries

The THAT clause traverses relationships from the entities found by FIND.

Basic Traversal

-- Find hosts that run services
FIND host THAT RUNS service

-- Find users that have been assigned roles
FIND user THAT ASSIGNED role

-- Find security groups that allow internet traffic
FIND security_group THAT ALLOWS internet

Multi-Hop Traversal

Chain multiple THAT clauses for multi-hop queries:

-- user → role → policy → resource (3 hops)
FIND user THAT ASSIGNED role THAT ALLOWS policy THAT USES aws_s3_bucket

Each THAT clause adds one hop. The entity filter after each verb narrows which entities at that hop qualify.

Filtering Traversal Targets

Add WITH after a traversal verb to filter the entities at that hop:

-- Users assigned to admin roles specifically
FIND user THAT ASSIGNED role WITH admin = true

-- Hosts running services that are publicly exposed
FIND host THAT RUNS service WITH public = true

-- Multi-hop with filters at each step
FIND user WITH active = true
  THAT ASSIGNED role WITH admin = true
  THAT ALLOWS aws_s3_bucket WITH public = true

Negated Traversal (Coverage Gaps)

Prefix a verb with ! to find entities that do not have a qualifying neighbor via that relationship:

-- Hosts with no EDR agent protecting them
FIND host THAT !PROTECTS edr_agent

-- Services with no firewall
FIND service THAT !PROTECTS firewall

-- Users who have never been assigned any role
FIND user THAT !ASSIGNED role

Note: Negation (!) cannot appear in a chained traversal. It must be the final THAT step.

-- Invalid: cannot chain after negation
FIND host THAT !PROTECTS edr_agent THAT HAS service   -- syntax error

Available Verbs

PQL supports 15 relationship verbs:

VerbSemantic Meaning
HASOwnership or containment (account HAS bucket)
ISIdentity or equivalence (user IS person)
ASSIGNEDRole or permission assignment (user ASSIGNED role)
ALLOWSNetwork or access permission (policy ALLOWS resource)
USESActive dependency (service USES database)
CONTAINSLogical grouping (vpc CONTAINS subnet)
MANAGESAdministrative control (team MANAGES repo)
CONNECTSNetwork connectivity (vpc CONNECTS vpc)
PROTECTSSecurity coverage (edr PROTECTS host)
EXPLOITSVulnerability relationship (cve EXPLOITS package)
TRUSTSTrust relationship (account TRUSTS account)
SCANSScanner coverage (scanner SCANS host)
RUNSProcess or service execution (host RUNS service)
READSData access (read) (app READS database)
WRITESData access (write) (app WRITES database)

Traversal Direction

PQL traversal follows edges in both directions by default when using the entity-type filter form. To control direction, use the Rust API directly or rely on the verb semantics:

  • FIND host THAT RUNS service — follows RUNS edges outgoing from host
  • FIND service THAT RUNS host — follows RUNS edges incoming to service (i.e., which hosts run this service)

The query executor determines direction based on the verb and entity order.

How Traversal Maps to Graph Operations

PQLGraph Operation
FIND A THAT V BFind all A; for each, traverse V edges to B
FIND A THAT !V BFind all A that have no V edges to B
FIND A THAT V B THAT W CFind A→B→C chains
FIND SHORTEST PATH FROM A TO BBFS bidirectional from A and B
FIND BLAST RADIUS FROM A DEPTH nBFS from A up to n hops

Path Queries

Parallax supports two special query forms for path-based analysis.

Shortest Path

FIND SHORTEST PATH finds the minimum-hop chain of relationships between two specific entities.

Syntax

FIND SHORTEST PATH
  FROM <entity_filter> [WITH <property_filters>]
  TO   <entity_filter> [WITH <property_filters>]
  [DEPTH <max_hops>]

Examples

-- Is there any connection between a user and a secret?
FIND SHORTEST PATH
  FROM user WITH email = 'alice@corp.com'
  TO secret WITH name = 'prod-db-password'

-- Privilege escalation path from a guest account to admin
FIND SHORTEST PATH
  FROM user WITH role = 'guest'
  TO role WITH admin = true
  DEPTH 6

-- Network path between two VPCs
FIND SHORTEST PATH
  FROM aws_vpc WITH _key = 'vpc-prod'
  TO aws_vpc WITH _key = 'vpc-dev'
  DEPTH 10

Response

If a path exists, the response includes:

  • The full sequence of entities in the path
  • The relationships connecting them
  • The total number of hops

If no path exists within DEPTH hops (or at all), the response has an empty path with count: 0.

Performance

Shortest path uses bidirectional BFS — exploring from both endpoints simultaneously. This is significantly faster than unidirectional BFS for deep graphs:

Graph SizePath LengthTypical Latency
10K entities4 hops<5ms
100K entities6 hops<50ms
1M entities8 hops<500ms

Blast Radius

FIND BLAST RADIUS computes the set of entities reachable from a starting point via attacker-relevant relationships.

Syntax

FIND BLAST RADIUS
  FROM <entity_filter> [WITH <property_filters>]
  [DEPTH <max_hops>]

Examples

-- What is at risk if this host is compromised?
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4

-- What can an attacker reach from this credential?
FIND BLAST RADIUS FROM credential WITH name = 'prod-api-key' DEPTH 5

-- Impact of a vulnerable package in production
FIND BLAST RADIUS FROM package WITH cve = 'CVE-2024-1234' DEPTH 3

Response

The blast radius response includes:

  • impacted: all entities reachable within the depth limit
  • high_value_targets: entities of high-value classes (DataStore, Secret, etc.)
  • critical_paths: specific paths to high-value targets
  • count: total number of impacted entities

High-Value Target Classes

The following entity classes are always flagged as high-value targets in blast radius results:

DataStore, Secret, Key, Database, Credential, Certificate, Identity, Account

Depth Guidance

DepthCoverageUse Case
2Immediate neighborsQuick triage
4Typical blast radiusMost analyses
6Extended blast radiusComprehensive analysis
8+Near-full graphUse sparingly — can be slow

Default depth is 4 if not specified.

Via REST API

# Shortest path
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND SHORTEST PATH FROM user WITH email = '\''alice@corp.com'\'' TO secret WITH name = '\''prod-db-password'\''"}'

# Blast radius
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND BLAST RADIUS FROM host WITH _key = '\''web-01'\'' DEPTH 4"}'

Query Execution

PQL queries go through a three-stage pipeline: parse → plan → execute.

Stage 1: Parse

The PQL lexer tokenizes the input string, then the recursive descent parser produces an AST.

#![allow(unused)]
fn main() {
use parallax_query::parse;

let ast = parse("FIND host WITH state = 'running'")?;
}

The parser is hand-written (no parser generator) for several reasons:

  • Precise error messages: "Expected '=' after property name 'state', got '<'"
  • Zero additional dependencies
  • Full control over error recovery

INV-Q01: The same query string always produces the same AST (deterministic).

Parse Errors

Parse errors include the position of the unexpected token:

Error: unexpected token '=' at position 18
  FIND host WITH state = = 'running'
                       ^
  expected: comparison operator (=, !=, <, <=, >, >=)

Stage 2: Plan

The planner transforms the AST into a QueryPlan — a concrete execution strategy. It consults IndexStats to choose the most efficient access path.

#![allow(unused)]
fn main() {
use parallax_query::{parse, plan};

let ast = parse("FIND host WITH state = 'running'")?;
let query_plan = plan(&ast, &index_stats)?;
}

Index Stats

IndexStats tracks entity counts by type and class:

#![allow(unused)]
fn main() {
pub struct IndexStats {
    pub type_counts: HashMap<String, u64>,   // "host" → 1234
    pub class_counts: HashMap<String, u64>,  // "Host" → 5678
    pub entity_count: u64,
    pub relationship_count: u64,
}
}

Access Strategy Selection

Query PatternChosen StrategyReason
FIND hostTypeIndexScanType index available
FIND HostClassIndexScanClass index available
FIND *FullScanNo narrowing possible
FIND host WITH state='running'TypeIndexScan + filterType index reduces candidates

Stage 3: Execute

The executor runs the query plan against an MVCC snapshot:

#![allow(unused)]
fn main() {
use parallax_query::execute;

let snap = engine.snapshot();
let result = execute(&query_plan, &snap, &limits)?;
}

QueryLimits

#![allow(unused)]
fn main() {
pub struct QueryLimits {
    pub max_results: usize,      // default: 10_000
    pub timeout: Duration,       // default: 30s
    pub max_traversal_depth: usize,  // default: 10
}
}

INV-Q03: max_results is a hard cap — the query returns at most this many results.

INV-Q04: If the query takes longer than timeout, it returns an error, not a partial result.

QueryResult

#![allow(unused)]
fn main() {
pub enum QueryResult {
    Entities(Vec<Entity>),
    Traversals(Vec<TraversalResult>),
    Paths(Vec<GraphPath>),
    Scalar(u64),   // For RETURN COUNT
}
}

Execution Plan Examples

FIND host WITH state = 'running'

QueryPlan::Find {
    access: TypeIndexScan { entity_type: "host" },
    filters: [PropertyFilter { key: "state", op: Eq, value: "running" }],
    return_: ReturnAll,
    limit: None,
}

Execution:

  1. Load type index entry for "host"[EntityId1, EntityId2, ...]
  2. For each ID, fetch entity from snapshot
  3. Apply property filter state = 'running'
  4. Return matching entities

FIND host THAT RUNS service

QueryPlan::Traversal {
    start: Find { access: TypeIndexScan { "host" }, filters: [] },
    steps: [TraversalStep { verb: "RUNS", target_type: "service", negated: false }],
    return_: ReturnAll,
}

Execution:

  1. Load all hosts from type index
  2. For each host, follow outgoing RUNS edges via adjacency index
  3. Filter targets to entity_type == "service"
  4. Return matching host + service pairs

FIND host RETURN COUNT

QueryPlan::Find {
    access: TypeIndexScan { "host" },
    filters: [],
    return_: ReturnCount,
    limit: None,
}

Execution:

  1. Load type index entry for "host"
  2. Count entries without fetching entity data
  3. Return scalar count

Performance Characteristics

Query TypeTypical Latency (10K entities)
Type scan, no filter<100μs
Type scan + property filter<1ms
Single-hop traversal<500μs
3-hop traversal<5ms
Shortest path<10ms
Blast radius (depth 4)<10ms
RETURN COUNT<50μs (index only)

PQL Examples

Real-world query patterns organized by use case.

Asset Discovery

-- All hosts (any type: physical, VM, container)
FIND Host

-- All running cloud instances
FIND host WITH state = 'running'

-- All hosts in a specific region
FIND host WITH region = 'us-east-1'

-- All public-facing S3 buckets
FIND aws_s3_bucket WITH public = true

-- Count all entities by class
FIND Host RETURN COUNT
FIND User RETURN COUNT
FIND DataStore RETURN COUNT

Vulnerability Analysis

-- Find all vulnerable packages
FIND package WITH has_vulnerability = true

-- Find hosts running vulnerable software
FIND host THAT RUNS package WITH has_vulnerability = true

-- Packages that exploit a specific CVE class
FIND package THAT EXPLOITS vulnerability WITH severity = 'critical'

Access and Privilege

-- All users with admin roles
FIND user THAT ASSIGNED role WITH name = 'Administrator'

-- All policies that allow S3 access
FIND policy THAT ALLOWS aws_s3_bucket

-- Users who can access a specific bucket (indirect — via roles and policies)
FIND user THAT ASSIGNED role THAT ALLOWS aws_s3_bucket WITH public = false

-- Service accounts with broad permissions
FIND service_account THAT ASSIGNED role WITH permissions = 'all'

Network Connectivity

-- All services exposed to the internet
FIND service THAT CONNECTS internet

-- Hosts that allow inbound traffic from any IP
FIND firewall WITH allow_all = true THAT PROTECTS host

-- Internal services with no firewall
FIND service THAT !PROTECTS firewall

Security Coverage Gaps

-- Hosts with no EDR agent
FIND host THAT !PROTECTS edr_agent

-- Hosts never scanned
FIND host THAT !SCANS scanner

-- Services with no firewall protection
FIND service THAT !PROTECTS firewall

-- Databases with no backup
FIND Database THAT !HAS backup_job

Container and Cloud Native

-- All pods running as root
FIND pod WITH run_as_root = true

-- Containers with no resource limits
FIND container WITH cpu_limit = null

-- Clusters with nodes that have outdated Kubernetes versions
FIND cluster THAT CONTAINS node WITH k8s_version < '1.28.0'

-- Namespaces with privileged pods
FIND namespace THAT CONTAINS pod WITH privileged = true

Path and Reachability

-- Shortest path from a user to a secret
FIND SHORTEST PATH
  FROM user WITH email = 'alice@corp.com'
  TO secret WITH name = 'prod-db-password'

-- Shortest path between two networks
FIND SHORTEST PATH
  FROM aws_vpc WITH _key = 'vpc-prod'
  TO aws_vpc WITH _key = 'vpc-dev'
  DEPTH 10

-- Is there any connection between these systems?
FIND SHORTEST PATH
  FROM host WITH hostname = 'web-01'
  TO DataStore WITH name = 'customer-data'

Blast Radius

-- If web-01 is compromised, what's at risk?
FIND BLAST RADIUS FROM host WITH _key = 'web-01' DEPTH 4

-- Blast radius from a compromised credential
FIND BLAST RADIUS FROM credential WITH name = 'prod-api-key' DEPTH 5

-- Blast radius from a vulnerable package
FIND BLAST RADIUS FROM package WITH cve = 'CVE-2024-1234' DEPTH 3

Multi-hop Traversal

-- Complete access chain: user → role → policy → resource
FIND user
  THAT ASSIGNED role
  THAT ALLOWS policy
  THAT USES aws_s3_bucket

-- Application stack: user → app → service → database
FIND user
  THAT USES application
  THAT CONNECTS service
  THAT USES database WITH environment = 'production'

-- Supply chain: package → application → host
FIND package WITH has_vulnerability = true
  THAT USES application
  THAT RUNS host WITH environment = 'production'

Compliance Queries

-- CIS: All admin users (should be minimal)
FIND user THAT ASSIGNED role WITH admin = true RETURN COUNT

-- PCI-DSS: Services that touch cardholder data
FIND service THAT USES database WITH contains_card_data = true

-- NIST: Hosts without encryption at rest
FIND host WITH encryption_at_rest = false

-- All external-facing services without WAF
FIND service WITH external = true THAT !PROTECTS waf

Connector SDK Overview

The parallax-connect crate is the extension surface of Parallax — the part third-party developers touch most. It defines the Connector trait and provides all the infrastructure for collecting and publishing data.

What a Connector Does

A connector bridges an external data source (AWS, Okta, GitHub, a scanner, etc.) and the Parallax graph. It:

  1. Authenticates with the external API
  2. Collects entities and relationships (in parallel steps)
  3. Emits them via the SDK
  4. The SDK handles diffing, validation, and atomic commit

The connector author never writes to the storage engine directly. They implement the Connector trait and call ctx.emit_entity() / ctx.emit_relationship() — the framework handles the rest.

The Lifecycle

┌──────────┐   ┌───────────┐   ┌──────────┐   ┌───────────┐   ┌──────────┐
│ Configure │──►│ Discover  │──►│ Collect  │──►│  Publish  │──►│  Commit  │
│(auth,     │   │(validate  │   │(fetch    │   │ (SDK diff │   │ (atomic  │
│ settings) │   │ creds)    │   │ assets)  │   │  + queue) │   │  write)  │
└──────────┘   └───────────┘   └──────────┘   └───────────┘   └──────────┘

Steps 1–4 are in the connector. Step 5 (commit) is handled by the scheduler or the caller. The separation is deliberate: parallax-connect has no dependency on parallax-store, keeping the dependency graph acyclic.

Quick Example

#![allow(unused)]
fn main() {
use parallax_connect::prelude::*;

pub struct MyConnector;

#[async_trait]
impl Connector for MyConnector {
    fn name(&self) -> &str { "my-connector" }

    fn steps(&self) -> Vec<StepDefinition> {
        vec![
            step("hosts", "Collect hosts").build(),
            step("services", "Collect services")
                .depends_on(&["hosts"])
                .build(),
        ]
    }

    async fn execute_step(
        &self,
        step_id: &str,
        ctx: &mut StepContext,
    ) -> Result<(), ConnectorError> {
        match step_id {
            "hosts" => {
                ctx.emit_entity(
                    entity("host", "web-01")
                        .class("Host")
                        .display_name("Web Server 01")
                        .property("state", "running")
                )?;
                Ok(())
            }
            "services" => {
                ctx.emit_entity(
                    entity("service", "nginx")
                        .class("Service")
                        .display_name("Nginx")
                )?;
                ctx.emit_relationship(
                    relationship("host", "web-01", "RUNS", "service", "nginx")
                )?;
                Ok(())
            }
            _ => Err(ConnectorError::UnknownStep(step_id.to_string())),
        }
    }
}
}

Running a Connector

#![allow(unused)]
fn main() {
use parallax_connect::run_connector;
use parallax_ingest::commit_sync_exclusive;

let output = run_connector(&MyConnector, "my-account", "sync-001", None).await?;
let result = commit_sync_exclusive(
    &mut engine,
    &output.connector_id,
    &output.sync_id,
    output.entities,
    output.relationships,
)?;

println!("Created: {}", result.stats.entities_created);
println!("Updated: {}", result.stats.entities_updated);
println!("Deleted: {}", result.stats.entities_deleted);
}

Or with SyncEngine for shared engine access:

#![allow(unused)]
fn main() {
use parallax_connect::run_connector;
use parallax_ingest::SyncEngine;

let output = run_connector(&connector, "my-account", "sync-002", Some(&event_tx)).await?;
sync_engine.commit_sync(
    &output.connector_id,
    &output.sync_id,
    output.entities,
    output.relationships,
)?;
}

Key Principles

  • Idempotent: Running the same connector twice with the same source data produces no changes (entities_unchanged = n, created = 0, deleted = 0).
  • Source-scoped: Connector A's data is never deleted by connector B's sync.
  • Atomic: Either the entire sync batch lands or none of it does.
  • Fault-tolerant: A failed step does not prevent independent steps from running.

See Writing a Connector for the full guide.

Writing a Connector

A complete guide to implementing the Connector trait.

1. Create a Crate

Connectors are separate crates depending on parallax-connect:

# Cargo.toml
[package]
name = "connector-myservice"
version = "0.1.0"
edition = "2021"

[dependencies]
parallax-connect = { path = "../parallax-connect" }
async-trait = "0.1"
tokio = { version = "1", features = ["full"] }
# ... your API client crate

2. Define the Struct

#![allow(unused)]
fn main() {
use parallax_connect::prelude::*;

pub struct MyServiceConnector {
    api_base_url: String,
    api_token: String,
}

impl MyServiceConnector {
    pub fn new(api_base_url: impl Into<String>, api_token: impl Into<String>) -> Self {
        MyServiceConnector {
            api_base_url: api_base_url.into(),
            api_token: api_token.into(),
        }
    }
}
}

3. Define Steps

Steps are the units of collection. Each step is independent or depends on prior steps. Define them in the steps() method:

#![allow(unused)]
fn main() {
#[async_trait]
impl Connector for MyServiceConnector {
    fn name(&self) -> &str {
        "my-service"
    }

    fn steps(&self) -> Vec<StepDefinition> {
        vec![
            step("users", "Collect users").build(),
            step("hosts", "Collect hosts").build(),
            step("services", "Collect services")
                .depends_on(&["hosts"])     // runs after "hosts"
                .build(),
            step("relationships", "Collect relationships")
                .depends_on(&["users", "hosts", "services"])  // runs last
                .build(),
        ]
    }
    // ...
}
}

INV-C05: Step dependencies must form a DAG. Cycles are rejected at connector load time.

INV-C06: A failed step does not prevent independent steps from running. Steps that don't depend on the failed step still execute.

4. Implement Steps

#![allow(unused)]
fn main() {
#[async_trait]
impl Connector for MyServiceConnector {
    // ... name() and steps() ...

    async fn execute_step(
        &self,
        step_id: &str,
        ctx: &mut StepContext,
    ) -> Result<(), ConnectorError> {
        match step_id {
            "users" => self.collect_users(ctx).await,
            "hosts" => self.collect_hosts(ctx).await,
            "services" => self.collect_services(ctx).await,
            "relationships" => self.collect_relationships(ctx).await,
            _ => Err(ConnectorError::UnknownStep(step_id.to_string())),
        }
    }
}
}

5. Emit Entities

#![allow(unused)]
fn main() {
impl MyServiceConnector {
    async fn collect_users(&self, ctx: &mut StepContext) -> Result<(), ConnectorError> {
        let users = self.fetch_users_from_api().await?;

        for user in users {
            ctx.emit_entity(
                entity("user", &user.id)          // (type, key)
                    .class("User")                // entity class
                    .display_name(&user.name)
                    .property("email", user.email.as_str())
                    .property("active", user.active)
                    .property("mfa_enabled", user.mfa_enabled)
            )?;
        }
        Ok(())
    }
}
}

ctx.emit_entity() returns Result<(), ConnectorError>. Emit errors are non-fatal by default — they log a warning but don't stop the step. To make them fatal, propagate with ?.

6. Emit Relationships

#![allow(unused)]
fn main() {
async fn collect_relationships(
    &self,
    ctx: &mut StepContext,
) -> Result<(), ConnectorError> {
    let assignments = self.fetch_role_assignments().await?;

    for assignment in assignments {
        ctx.emit_relationship(
            relationship(
                "user",         // from_type
                &assignment.user_id,  // from_key
                "ASSIGNED",     // verb
                "role",         // to_type
                &assignment.role_id,  // to_key
            )
            .property("assigned_at", assignment.timestamp.to_string().as_str())
        )?;
    }
    Ok(())
}
}

INV-C04: Referential integrity is enforced at commit time. A relationship whose from_key or to_key doesn't exist in the batch or the current graph will be rejected. Emit entities before the relationships that reference them.

7. Access Prior Step Data

Steps can read entities emitted by their dependencies:

#![allow(unused)]
fn main() {
async fn collect_services(
    &self,
    ctx: &mut StepContext,
) -> Result<(), ConnectorError> {
    // Read hosts emitted by the "hosts" step
    let host_ids: Vec<String> = ctx.prior
        .entities_by_type("host")
        .iter()
        .map(|e| e.entity_key.to_string())
        .collect();

    // Use host IDs to fetch services from the API
    for host_id in host_ids {
        let services = self.fetch_services_for_host(&host_id).await?;
        for service in services {
            ctx.emit_entity(
                entity("service", &service.id)
                    .class("Service")
                    .display_name(&service.name)
            )?;
        }
    }
    Ok(())
}
}

8. Error Handling

#![allow(unused)]
fn main() {
use parallax_connect::ConnectorError;

// For unrecognized step IDs (always include this)
Err(ConnectorError::UnknownStep(step_id.to_string()))

// For API errors
Err(ConnectorError::Custom(format!("API error: {}", response.status())))

// For configuration errors
Err(ConnectorError::Configuration("API token is empty".to_string()))
}

Connector errors are logged and reported in SyncEvent::StepFailed. They do not abort the entire sync — independent steps still run.

9. Run the Connector

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let connector = MyServiceConnector::new(
        "https://api.myservice.com",
        std::env::var("MYSERVICE_TOKEN")?,
    );

    let mut engine = StorageEngine::open(StoreConfig::new("./data"))?;

    let output = parallax_connect::run_connector(
        &connector,
        "my-account-id",  // account_id
        "sync-001",       // sync_id (unique per run)
        None,             // event_tx (optional observability channel)
    ).await?;

    let result = parallax_ingest::commit_sync_exclusive(
        &mut engine,
        &output.connector_id,
        &output.sync_id,
        output.entities,
        output.relationships,
    )?;

    println!("Sync complete:");
    println!("  Created: {}", result.stats.entities_created);
    println!("  Updated: {}", result.stats.entities_updated);
    println!("  Deleted: {}", result.stats.entities_deleted);
    println!("  Relationships created: {}", result.stats.relationships_created);

    Ok(())
}

Step Definitions

Steps are the units of collection within a connector. They enable parallel execution of independent collection tasks and sequential execution of dependent tasks.

Defining Steps

#![allow(unused)]
fn main() {
fn steps(&self) -> Vec<StepDefinition> {
    vec![
        // Independent steps (no dependencies — can run in parallel in future)
        step("users", "Collect IAM users").build(),
        step("roles", "Collect IAM roles").build(),
        step("hosts", "Collect EC2 instances").build(),

        // Dependent steps (run after their dependencies complete)
        step("policies", "Collect IAM policies")
            .depends_on(&["roles"])
            .build(),

        step("services", "Collect services on hosts")
            .depends_on(&["hosts"])
            .build(),

        // Step that depends on multiple prior steps
        step("relationships", "Wire all relationships")
            .depends_on(&["users", "roles", "policies", "hosts", "services"])
            .build(),
    ]
}
}

StepDefinition Builder

#![allow(unused)]
fn main() {
// Start a step definition
step(id: &str, description: &str)

// Methods
.depends_on(step_ids: &[&str]) -> Self    // declare dependencies
.build() -> StepDefinition                // finalize
}

Execution Order — Parallel Waves

The scheduler groups steps into topological waves. Steps within the same wave have no inter-dependencies and execute concurrently via tokio::task::JoinSet. Waves execute sequentially so downstream steps always see completed upstream data.

Given: users, roles, hosts, policies(→roles), services(→hosts), relationships(→all)

Wave 0 (parallel): users, roles, hosts       ← no dependencies
Wave 1 (parallel): policies, services        ← depend only on wave 0
Wave 2 (sequential): relationships           ← depends on everything

The wave grouping is computed by assigning each step a level: level = max(level of dependencies) + 1, with roots at level 0.

INV-C05: No Cycles

Step dependencies must form a DAG. Circular dependencies are detected at connector load time and return an error:

#![allow(unused)]
fn main() {
// This will fail validation:
step("a", "Step A").depends_on(&["b"]).build(),
step("b", "Step B").depends_on(&["a"]).build(),
// Error: "cycle detected in step dependencies: a -> b -> a"
}

INV-C06: Fault Isolation

A failed step does not prevent sibling steps in the same wave from running. The scheduler logs the error and the wave completes with whatever steps succeeded:

Wave 0:
  Step "hosts" completed: 100 entities
  Step "users" FAILED: API timeout after 30s   ← sibling steps still run
  Step "roles" completed: 25 entities           ← unaffected by "users" failure

Wave 1:
  Step "policies" completed: 50 entities        ← runs despite "users" failure

Steps that depend on a failed step are not skipped automatically — they run but see an incomplete prior_entities set (the failed step contributed nothing to it).

Prior Step Data

Downstream steps can read entities emitted by their dependencies via ctx.prior:

#![allow(unused)]
fn main() {
async fn collect_services(
    &self,
    step_id: &str,
    ctx: &mut StepContext,
) -> Result<(), ConnectorError> {
    // Access entities from prior steps
    for host in ctx.prior.entities_by_type("host") {
        let services = self.api_client
            .get_services_for_host(&host.entity_key)
            .await?;
        // emit services...
    }
    Ok(())
}
}

ctx.prior is a snapshot of all entities emitted by all successfully completed prior steps (not just direct dependencies — all prior steps).

Step Naming Conventions

ConventionExample
Lowercase kebab-case"iam-users", "ec2-instances"
Noun or noun-phrase"users", "role-assignments"
No spaces"security-groups" not "security groups"

Steps IDs are used in log output and SyncEvent messages, so choose descriptive names.

Entity & Relationship Builders

The builder API is the primary way to construct entities and relationships in connector code. It uses a fluent interface designed to be readable and hard to misuse.

Entity Builder

entity(type, key) — Start a Builder

#![allow(unused)]
fn main() {
use parallax_connect::builder::entity;

let builder = entity("host", "web-01");
}

type is the entity type (open set, snake_case, e.g., "host", "aws_ec2_instance"). key is the source-system's unique identifier for this entity.

Builder Methods

#![allow(unused)]
fn main() {
entity("host", "web-01")
    // Required: entity class (closed set of ~40 values)
    .class("Host")

    // Optional: human-readable name
    .display_name("Web Server 01")

    // Add a single property (value can be any Value type)
    .property("state", "running")        // String
    .property("cpu_count", 4i64)         // Integer
    .property("memory_gb", 32.0f64)      // Float
    .property("active", true)            // Boolean
    .property("terminated_at", Value::Null)  // Null

    // Add multiple properties at once
    .properties([
        ("region", Value::from("us-east-1")),
        ("az", Value::from("us-east-1a")),
    ])
}

.property() Value Types

The .property() method accepts anything that implements Into<Value>:

Rust TypePQL Value TypeExample
&str, StringString.property("state", "running")
i64, i32, usizeInt.property("port", 443i64)
f64, f32Float.property("score", 9.8f64)
boolBool.property("active", true)
Value::NullNull.property("deleted_at", Value::Null)

Emitting the Entity

#![allow(unused)]
fn main() {
ctx.emit_entity(
    entity("host", "web-01")
        .class("Host")
        .display_name("Web Server 01")
        .property("state", "running")
)?;
}

emit_entity() returns Result<(), ConnectorError>. The error is returned if the entity builder is invalid (e.g., missing class).

Relationship Builder

relationship(from_type, from_key, verb, to_type, to_key) — Start a Builder

#![allow(unused)]
fn main() {
use parallax_connect::builder::relationship;

let builder = relationship("host", "web-01", "RUNS", "service", "nginx");
}

Builder Methods

#![allow(unused)]
fn main() {
relationship("host", "web-01", "RUNS", "service", "nginx")
    // Add properties to the edge (optional)
    .property("since", "2024-01-15")
    .property("port", 8080i64)
}

Emitting the Relationship

#![allow(unused)]
fn main() {
ctx.emit_relationship(
    relationship("host", "web-01", "RUNS", "service", "nginx")
        .property("port", 443i64)
)?;
}

Important: Both endpoints of the relationship must exist in the current batch or in the committed graph. Emitting a relationship to a non-existent entity is not an error at emit time, but it will be rejected at commit time (INV-C04).

Full Example

#![allow(unused)]
fn main() {
async fn collect_hosts_and_services(ctx: &mut StepContext) -> Result<(), ConnectorError> {
    // Emit host
    ctx.emit_entity(
        entity("host", "web-01")
            .class("Host")
            .display_name("Web Server 01")
            .property("state", "running")
            .property("region", "us-east-1")
            .property("cpu_count", 8i64)
            .property("memory_gb", 32.0f64)
    )?;

    // Emit service
    ctx.emit_entity(
        entity("service", "nginx-web-01")
            .class("Service")
            .display_name("Nginx on web-01")
            .property("port", 443i64)
            .property("protocol", "https")
    )?;

    // Emit relationship (host RUNS service)
    ctx.emit_relationship(
        relationship("host", "web-01", "RUNS", "service", "nginx-web-01")
            .property("since", "2024-01-15")
    )?;

    Ok(())
}
}

StepContext Metrics

After emitting, you can inspect how many entities/relationships were emitted:

#![allow(unused)]
fn main() {
ctx.emit_entity(...)?;
ctx.emit_entity(...)?;
ctx.emit_relationship(...)?;

// Metrics for this step
println!("Entities emitted: {}", ctx.metrics.entities_emitted);
println!("Relationships emitted: {}", ctx.metrics.relationships_emitted);
}

Sync Protocol

The sync protocol handles the transition from collected data to committed graph state. It diffs the new batch against the existing data and commits the delta atomically.

The Diff Algorithm

For each connector sync, the engine computes a diff between:

  • Emitted: entities and relationships from the current connector run
  • Existing: entities and relationships already in the graph from the same connector
Emitted entities:  {A, B, C}
Existing entities: {A, B, D}  ← D was in the last sync but not emitted this time

Delta:
  Upsert A (unchanged if properties match)
  Upsert B (unchanged if properties match)
  Upsert C (new)
  Delete D (not seen in this sync)

Source Scope (INV-C02)

The diff is scoped to the connector's source. Connector B's sync never deletes entities created by connector A:

Graph state:
  Host web-01 (source: aws-connector)
  Host web-02 (source: aws-connector)
  User alice   (source: okta-connector)

Okta sync: emits [alice-v2]
Result:
  Host web-01 (source: aws-connector) — UNCHANGED
  Host web-02 (source: aws-connector) — UNCHANGED
  User alice   (source: okta-connector) — UPDATED

Atomic Commit (INV-C01)

The entire delta (creates + updates + deletes) is committed as a single WriteBatch. Either all changes land or none do:

#![allow(unused)]
fn main() {
// SyncEngine::commit_sync internals:
let mut batch = WriteBatch::new();

for entity in &entities {
    match existing.find(|e| e.id == entity.id) {
        None => batch.upsert_entity(entity.clone()),           // create
        Some(ex) if ex.properties != entity.properties =>
            batch.upsert_entity(entity.clone()),               // update
        Some(_) => {}                                          // unchanged
    }
}

for existing in &existing_entities {
    if !seen_ids.contains(&existing.id) {
        batch.delete_entity(existing.id);                      // delete
    }
}

// Atomic commit
engine.write(batch)?;
}

Referential Integrity (INV-C04)

Before committing, the ingest layer validates that every relationship's endpoints exist. The check considers:

  1. Entities in the current batch (being committed now)
  2. Entities already in the graph from any connector
#![allow(unused)]
fn main() {
// validate_sync_batch checks each relationship:
let available: HashSet<EntityId> = batch_entities.iter().map(|e| e.id).collect::<_>()
    .union(&snapshot_entities.iter().map(|e| e.id).collect::<_>())
    .copied()
    .collect();

for rel in &relationships {
    if !available.contains(&rel.from_id) {
        return Err(SyncError::DanglingRelationship { ... });
    }
    if !available.contains(&rel.to_id) {
        return Err(SyncError::DanglingRelationship { ... });
    }
}
}

A sync batch with a dangling relationship returns SyncError::DanglingRelationship and no data is committed.

SyncResult

#![allow(unused)]
fn main() {
pub struct SyncResult {
    pub sync_id: String,
    pub stats: SyncStats,
}

pub struct SyncStats {
    pub entities_created: u64,
    pub entities_updated: u64,
    pub entities_unchanged: u64,
    pub entities_deleted: u64,
    pub relationships_created: u64,
    pub relationships_updated: u64,
    pub relationships_unchanged: u64,
    pub relationships_deleted: u64,
}
}

Two Commit Modes

commit_sync_exclusive — Exclusive Engine Access

Used when you hold &mut StorageEngine. Best for single-threaded usage or CLI tools:

#![allow(unused)]
fn main() {
let result = commit_sync_exclusive(
    &mut engine,
    &output.connector_id,
    &output.sync_id,
    output.entities,
    output.relationships,
)?;
}

SyncEngine::commit_sync — Shared Engine Access

Used in server mode where multiple connectors share one engine via Arc<Mutex<StorageEngine>>:

#![allow(unused)]
fn main() {
let sync_engine = SyncEngine::new(Arc::clone(&engine));
let result = sync_engine.commit_sync(
    &output.connector_id,
    &output.sync_id,
    output.entities,
    output.relationships,
)?;
}

SyncEngine::commit_sync holds the engine lock only during the brief write step, not during diff computation. Multiple connectors can diff concurrently; they only serialize at the write step.

Idempotency

Running the same sync twice with identical data is safe and efficient:

First run:   entities_created = 5, entities_deleted = 0
Second run:  entities_unchanged = 5, entities_created = 0, entities_deleted = 0

The diff detects that nothing changed and the WriteBatch is empty, so no WAL write occurs.

Connector Observability

The connector SDK provides structured events and logging for monitoring sync executions.

SyncEvent Stream

Pass a tokio::sync::mpsc::Sender<SyncEvent> to run_connector to receive real-time events during the sync:

#![allow(unused)]
fn main() {
use tokio::sync::mpsc;
use parallax_connect::{run_connector, SyncEvent};

let (tx, mut rx) = mpsc::channel(100);

// Spawn event consumer
tokio::spawn(async move {
    while let Some(event) = rx.recv().await {
        match event {
            SyncEvent::Started { connector_id, sync_id } => {
                tracing::info!(%connector_id, %sync_id, "Sync started");
            }
            SyncEvent::StepStarted { step_id } => {
                tracing::info!(%step_id, "Step started");
            }
            SyncEvent::StepCompleted { step_id, entities, relationships, duration } => {
                tracing::info!(
                    %step_id,
                    entities_emitted = entities,
                    relationships_emitted = relationships,
                    duration_ms = duration.as_millis(),
                    "Step completed"
                );
            }
            SyncEvent::StepFailed { step_id, error } => {
                tracing::warn!(%step_id, %error, "Step failed");
            }
            SyncEvent::Completed { connector_id, sync_id } => {
                tracing::info!(%connector_id, %sync_id, "Sync completed");
            }
        }
    }
});

let output = run_connector(&connector, "account", "sync-001", Some(&tx)).await?;
}

SyncEvent Variants

#![allow(unused)]
fn main() {
pub enum SyncEvent {
    Started {
        connector_id: String,
        sync_id: String,
    },
    StepStarted {
        step_id: String,
    },
    StepCompleted {
        step_id: String,
        entities: usize,
        relationships: usize,
        duration: Duration,
    },
    StepFailed {
        step_id: String,
        error: ConnectorError,
    },
    Completed {
        connector_id: String,
        sync_id: String,
    },
}
}

Structured Logging

The scheduler uses tracing for structured logs. Enable them with any tracing subscriber:

#![allow(unused)]
fn main() {
tracing_subscriber::fmt::init();
}

Relevant log fields:

  • connector_id — identifies the connector
  • sync_id — identifies this specific run
  • step_id — identifies the current step
  • entities — count of entities emitted in a step
  • relationships — count of relationships emitted

Metrics Integration

After a sync, the SyncStats from commit_sync / commit_sync_exclusive provides counters suitable for Prometheus export:

#![allow(unused)]
fn main() {
let stats = result.stats;

// Export to your metrics system
metrics::counter!("parallax_entities_created_total")
    .increment(stats.entities_created);
metrics::counter!("parallax_entities_deleted_total")
    .increment(stats.entities_deleted);
metrics::counter!("parallax_relationships_created_total")
    .increment(stats.relationships_created);
}

Or use the built-in Prometheus endpoint when running parallax-server: GET /metrics returns engine-wide counters.

Tracing Context Propagation

When running connectors inside an existing trace context, the connector steps inherit the current span:

#![allow(unused)]
fn main() {
let span = tracing::info_span!("sync_run", connector = "my-service");
let output = async move {
    run_connector(&connector, "account", "sync-001", None).await
}
.instrument(span)
.await?;
}

All log output from within run_connector will be nested under this span.

REST API Overview

parallax-server exposes a REST HTTP API over Axum. All responses are JSON.

Base URL

http://localhost:7700   (default)

Configure with --host and --port flags or the PARALLAX_HOST / PARALLAX_PORT environment variables.

Versioning

All endpoints are prefixed with /v1/. The version is part of the URL path, not a header.

Endpoint Summary

MethodPathDescription
GET/v1/healthHealth check (auth-exempt)
GET/v1/statsEntity and relationship counts
POST/v1/queryExecute a PQL query
GET/v1/entities/:idFetch an entity by ID
GET/v1/relationships/:idFetch a relationship by ID
POST/v1/ingest/syncConnector sync batch
POST/v1/ingest/writeDirect write batch
GET/v1/connectorsList registered connectors
POST/v1/connectors/:id/syncTrigger a connector sync
GET/metricsPrometheus metrics exposition

Content Type

All request bodies must be Content-Type: application/json. All responses have Content-Type: application/json (except /metrics).

Request IDs

Every request receives a X-Request-Id header in the response (INV-A05). The server either generates a UUID v4 or propagates the value from the incoming request's X-Request-Id header. Use this for log correlation.

curl -v http://localhost:7700/v1/health
# < X-Request-Id: 550e8400-e29b-41d4-a716-446655440000

Common Response Codes

StatusMeaning
200 OKSuccess
400 Bad RequestInvalid request body or parameters
401 UnauthorizedMissing or invalid API key
404 Not FoundEntity/relationship not found
500 Internal Server ErrorStorage or processing error

Starting the Server

# No auth (development)
parallax serve --data-dir ./data

# With API key
PARALLAX_API_KEY=my-secret-key parallax serve --data-dir ./data

# Custom host and port
parallax serve --host 0.0.0.0 --port 8080 --data-dir /var/lib/parallax

Authentication

Parallax uses API key authentication. When no API key is configured, the server runs in open mode (all requests accepted).

Configuration

Set the PARALLAX_API_KEY environment variable before starting the server:

export PARALLAX_API_KEY="your-secret-api-key"
parallax serve --data-dir ./data

If PARALLAX_API_KEY is not set or is empty, the server starts in open mode and accepts all requests without authentication.

Making Authenticated Requests

Include the API key as a Bearer token in the Authorization header:

curl -H "Authorization: Bearer your-secret-api-key" \
     http://localhost:7700/v1/stats

Or with an explicit header:

curl -H "Authorization: Bearer your-secret-api-key" \
     -X POST http://localhost:7700/v1/query \
     -H "Content-Type: application/json" \
     -d '{"pql": "FIND host"}'

Exemptions

The /v1/health endpoint is always exempt from authentication (INV-A06). This allows load balancers and health check systems to verify server liveness without credentials.

# Always works, even with auth enabled
curl http://localhost:7700/v1/health

The /metrics Prometheus endpoint is not exempt — protect it from unauthenticated access in production.

Security Properties

Constant-time comparison (INV-A02): API key verification uses constant-time string comparison to prevent timing attacks. An attacker cannot determine the correct key by measuring response time differences:

#![allow(unused)]
fn main() {
fn ct_eq(a: &str, b: &str) -> bool {
    if a.len() != b.len() { return false; }
    a.bytes().zip(b.bytes()).fold(0u8, |acc, (x, y)| acc | (x ^ y)) == 0
}
}

Keys are never logged: The API key is stored in memory as Arc<String>. It is never written to log output, error messages, or response bodies.

401 Response

When authentication fails:

{
  "error": "Unauthorized",
  "message": "missing or invalid API key"
}

HTTP status: 401 Unauthorized

Best Practices

  1. Use a random, high-entropy key: openssl rand -hex 32
  2. Rotate keys regularly in production
  3. Use TLS in production (TLS termination planned for v0.2)
  4. Restrict network access to the server — treat the API key as a secondary defense, not the primary

Open Mode vs. Protected Mode

ModeConfigUse Case
OpenPARALLAX_API_KEY not setLocal development, CLI usage, testing
ProtectedPARALLAX_API_KEY=<key>Production, shared deployments

In open mode, all endpoints are accessible without credentials. This is suitable for local development and single-user CLI usage where network access is already restricted.

Query Endpoints

POST /v1/query

Execute a PQL query and return results.

Request Body

{
  "pql": "FIND host WITH state = 'running'"
}
FieldTypeRequiredDescription
pqlstringYesThe PQL query to execute

Response: Entity Query (200 OK)

{
  "count": 2,
  "entities": [
    {
      "id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
      "entity_type": "host",
      "entity_class": "Host",
      "display_name": "Web Server 01",
      "properties": {
        "state": "running",
        "region": "us-east-1"
      },
      "source": {
        "connector_id": "my-connector",
        "sync_id": "sync-001"
      }
    },
    {
      "id": "b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7",
      "entity_type": "host",
      "entity_class": "Host",
      "display_name": "Database Server 01",
      "properties": {
        "state": "running",
        "region": "us-west-2"
      }
    }
  ]
}

Response: Count Query (200 OK)

For FIND ... RETURN COUNT:

{
  "count": 42
}

Response: Traversal Query (200 OK)

For FIND A THAT VERB B:

{
  "count": 1,
  "entities": [
    {
      "id": "...",
      "entity_type": "host",
      "entity_class": "Host",
      "display_name": "Web Server 01"
    }
  ]
}

Response: Path Query (200 OK)

For FIND SHORTEST PATH FROM A TO B:

{
  "count": 1,
  "path": [
    {"entity_id": "user-alice-id"},
    {"entity_id": "role-admin-id", "via_relationship": "rel-assigned-id"},
    {"entity_id": "secret-prod-id", "via_relationship": "rel-allows-id"}
  ]
}

If no path exists:

{
  "count": 0,
  "path": null
}

Error: Parse Error (400)

{
  "error": "ParseError",
  "message": "unexpected token '=' at position 18: expected comparison operator"
}

Error: Execution Error (500)

{
  "error": "ExecutionError",
  "message": "query timed out after 30000ms"
}

Examples

# Find all running hosts
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND host WITH state = '\''running'\''"}'

# Count all services
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND Service RETURN COUNT"}'

# Find hosts with no EDR
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND host THAT !PROTECTS edr_agent"}'

# Blast radius
curl -X POST http://localhost:7700/v1/query \
  -H 'Content-Type: application/json' \
  -d '{"pql": "FIND BLAST RADIUS FROM host WITH _key = '\''web-01'\'' DEPTH 4"}'

Ingest Endpoints

POST /v1/ingest/sync

Submit a connector sync batch. This is the primary ingest endpoint.

The server runs the sync protocol: validates referential integrity, diffs against the current graph state for this connector, and commits the delta atomically (INV-C01).

Request Body

{
  "connector_id": "my-connector",
  "sync_id": "sync-2024-01-15-001",
  "entities": [
    {
      "entity_type": "host",
      "entity_key": "web-01",
      "entity_class": "Host",
      "display_name": "Web Server 01",
      "properties": {
        "state": "running",
        "region": "us-east-1",
        "cpu_count": 8
      }
    },
    {
      "entity_type": "service",
      "entity_key": "nginx-web-01",
      "entity_class": "Service",
      "display_name": "Nginx on web-01"
    }
  ],
  "relationships": [
    {
      "from_type": "host",
      "from_key": "web-01",
      "verb": "RUNS",
      "to_type": "service",
      "to_key": "nginx-web-01"
    }
  ]
}

Request Fields

FieldTypeRequiredDescription
connector_idstringYesIdentifies the connector. Used for source-scoped diff.
sync_idstringYesUnique ID for this sync run. Included in source tracking.
entitiesarrayYesEntities to upsert. Can be empty.
relationshipsarrayYesRelationships to upsert. Can be empty.

Entity Fields

FieldTypeRequiredDescription
entity_typestringYesType identifier (snake_case, e.g., "host")
entity_keystringYesSource-system unique key
entity_classstringYesClass from the known classes list
display_namestringNoHuman-readable name
propertiesobjectNoFlat key-value property bag

Relationship Fields

FieldTypeRequiredDescription
from_typestringYesSource entity type
from_keystringYesSource entity key
verbstringYesRelationship verb from the known verbs list
to_typestringYesTarget entity type
to_keystringYesTarget entity key
propertiesobjectNoFlat key-value property bag on the edge

Response (200 OK)

{
  "sync_id": "sync-2024-01-15-001",
  "entities_created": 2,
  "entities_updated": 0,
  "entities_unchanged": 0,
  "entities_deleted": 0,
  "relationships_created": 1,
  "relationships_updated": 0,
  "relationships_unchanged": 0,
  "relationships_deleted": 0
}

Error: Dangling Relationship (500)

If a relationship references an entity that doesn't exist in the batch or the current graph:

{
  "error": "DanglingRelationship",
  "message": "relationship from host:web-01 RUNS service:ghost — to_id not found in batch or graph"
}

Example

curl -X POST http://localhost:7700/v1/ingest/sync \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer your-key' \
  -d '{
    "connector_id": "my-connector",
    "sync_id": "sync-001",
    "entities": [
      {"entity_type": "host", "entity_key": "web-01", "entity_class": "Host",
       "display_name": "Web Server 01", "properties": {"state": "running"}}
    ],
    "relationships": []
  }'

POST /v1/ingest/write

Direct write batch — bypass the sync protocol. Use this when you want to write entities without source-scoped diffing.

Unlike /v1/ingest/sync, this endpoint does not:

  • Diff against previous state from the same connector
  • Delete entities not present in the batch

It simply upserts all provided entities and relationships.

Request Body

{
  "write_id": "write-001",
  "entities": [
    {
      "entity_type": "host",
      "entity_key": "h1",
      "entity_class": "Host",
      "display_name": "Server 1"
    }
  ],
  "relationships": []
}

Response (200 OK)

{
  "write_id": "write-001",
  "entities_written": 1,
  "relationships_written": 0
}

When to Use Write vs. Sync

Use CaseEndpoint
Connector that owns its entities/v1/ingest/sync
One-time bulk import/v1/ingest/write
Incremental updates (no deletions)/v1/ingest/write
Full re-sync with deletion of departed entities/v1/ingest/sync

Entity & Relationship Endpoints

GET /v1/entities/:id

Fetch a single entity by its EntityId.

URL Parameters

ParameterDescription
:idHex-encoded EntityId (32 hex characters = 16 bytes)

Computing an EntityId

Entity IDs are deterministic from (account_id, entity_type, entity_key). You can compute them client-side using the same blake3 derivation:

#![allow(unused)]
fn main() {
use parallax_core::entity::EntityId;

// account_id is "default" when using the REST API without multi-tenancy
let id = EntityId::derive("default", "host", "web-01");
let hex = format!("{id}");  // 32 hex characters
}

Or derive the hex string from the list response (/v1/query).

Response (200 OK)

{
  "id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
  "type": "host",
  "class": "Host",
  "display_name": "Web Server 01",
  "properties": {
    "state": "running",
    "region": "us-east-1",
    "cpu_count": 8
  },
  "source": {
    "connector_id": "my-connector",
    "sync_id": "sync-001"
  },
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:05:00Z"
}

Response (404 Not Found)

{
  "error": "NotFound",
  "message": "entity a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6 not found"
}

Example

# Derive entity ID
HOST_ID=$(printf '%s' 'default:host:web-01' | sha256sum | cut -c1-32)

curl http://localhost:7700/v1/entities/a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6

GET /v1/relationships/:id

Fetch a single relationship by its RelationshipId.

Response (200 OK)

{
  "id": "d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9",
  "class": "RUNS",
  "from_id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
  "to_id": "b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7",
  "properties": {
    "port": 443
  },
  "source": {
    "connector_id": "my-connector",
    "sync_id": "sync-001"
  }
}

Response (404 Not Found)

{
  "error": "NotFound",
  "message": "relationship d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9 not found"
}

Admin Endpoints

GET /v1/health

Health check endpoint. Always returns 200 OK if the server is running. This endpoint is exempt from authentication (INV-A06).

Response (200 OK)

{
  "status": "ok",
  "version": "0.1.0"
}

Use this for load balancer health checks and readiness probes.


GET /v1/stats

Engine statistics including entity and relationship counts.

Response (200 OK)

{
  "total_entities": 12543,
  "total_relationships": 45210,
  "version": "0.1.0",
  "uptime_seconds": 3600,
  "type_counts": {
    "host": 1200,
    "user": 5400,
    "service": 2100,
    "aws_s3_bucket": 450,
    "aws_iam_role": 380
  },
  "class_counts": {
    "Host": 1200,
    "User": 5400,
    "Service": 2100,
    "DataStore": 450
  }
}

Example

curl http://localhost:7700/v1/stats

GET /v1/connectors

List registered connectors.

In v0.1, this returns the list of connector IDs that have submitted at least one sync batch to this server instance.

Response (200 OK)

{
  "connectors": [
    {
      "id": "aws-connector",
      "last_sync_id": "sync-2024-01-15-003",
      "entity_count": 8421
    },
    {
      "id": "okta-connector",
      "last_sync_id": "sync-2024-01-15-001",
      "entity_count": 4122
    }
  ]
}

POST /v1/connectors/:id/sync

Trigger a sync for a registered connector.

In v0.1, this is a stub endpoint that returns 501 Not Implemented. Full server-side connector scheduling is planned for v0.2.

Response (501 Not Implemented)

{
  "error": "NotImplemented",
  "message": "server-side connector scheduling is planned for v0.2"
}

For now, run connectors out-of-process and use POST /v1/ingest/sync to submit the results.


GET /v1/policies

List the currently loaded policy rules.

Response (200 OK)

{
  "count": 2,
  "rules": [
    {
      "id": "edr-coverage-001",
      "name": "EDR coverage gap",
      "severity": "high",
      "description": "Hosts without EDR protection.",
      "query": "FIND host THAT !PROTECTS edr_agent",
      "enabled": true,
      "frameworks": [
        { "framework": "CIS-Controls-v8", "control": "10.1" }
      ]
    }
  ]
}

POST /v1/policies

Replace the loaded rule set. PQL in each enabled rule is validated at load time (INV-P06) — invalid PQL returns 400.

Request body

{
  "rules": [
    {
      "id": "edr-coverage-001",
      "name": "EDR coverage gap",
      "severity": "high",
      "description": "Hosts without EDR protection.",
      "query": "FIND host THAT !PROTECTS edr_agent",
      "frameworks": [{ "framework": "CIS-Controls-v8", "control": "10.1" }],
      "schedule": "manual",
      "remediation": "Deploy EDR agent.",
      "enabled": true
    }
  ]
}

Response (200 OK)

{ "loaded": 1 }

Response (400 Bad Request) — invalid PQL

{ "error": "policy validation failed: Rule 'edr-coverage-001' contains invalid PQL: ..." }

POST /v1/policies/evaluate

Evaluate all enabled rules against the current graph snapshot.

Response (200 OK)

{
  "total": 2,
  "pass": 1,
  "fail": 1,
  "results": [
    {
      "rule_id": "edr-coverage-001",
      "status": "Fail",
      "violation_count": 3,
      "error": null
    },
    {
      "rule_id": "mfa-all-users",
      "status": "Pass",
      "violation_count": 0,
      "error": null
    }
  ]
}

Status values: "Pass", "Fail", "Error", "Skipped".


GET /v1/policies/posture

Compute compliance posture for a framework.

Query parameters

ParameterDefaultDescription
frameworkCIS-Controls-v8Framework name to report on

Response (200 OK)

{
  "framework": "CIS-Controls-v8",
  "overall_score": 0.75,
  "controls": [
    {
      "control_id": "10.1",
      "status": "Fail",
      "rule_count": 1,
      "pass_count": 0,
      "fail_count": 1
    },
    {
      "control_id": "6.5",
      "status": "Pass",
      "rule_count": 1,
      "pass_count": 1,
      "fail_count": 0
    }
  ]
}

overall_score is the fraction of controls with status Pass (0.0–1.0). Controls with no mapped rules are not included.

Prometheus Metrics

GET /metrics

Returns engine metrics in Prometheus text exposition format. Use this endpoint to integrate with Prometheus, Grafana, or any compatible metrics system.

Response (200 OK)

# HELP parallax_entities_total Total number of entities in the graph
# TYPE parallax_entities_total gauge
parallax_entities_total 12543

# HELP parallax_relationships_total Total number of relationships in the graph
# TYPE parallax_relationships_total gauge
parallax_relationships_total 45210

# HELP parallax_writes_total Total number of write batches committed
# TYPE parallax_writes_total counter
parallax_writes_total 1024

# HELP parallax_reads_total Total number of snapshot reads
# TYPE parallax_reads_total counter
parallax_reads_total 98432

# HELP parallax_uptime_seconds Server uptime in seconds
# TYPE parallax_uptime_seconds gauge
parallax_uptime_seconds 3601

Content Type

Content-Type: text/plain; version=0.0.4; charset=utf-8

Prometheus Scrape Configuration

# prometheus.yml
scrape_configs:
  - job_name: 'parallax'
    static_configs:
      - targets: ['localhost:7700']
    metrics_path: '/metrics'
    # Add auth if PARALLAX_API_KEY is set:
    authorization:
      credentials: 'your-api-key'

Available Metrics

MetricTypeDescription
parallax_entities_totalGaugeCurrent entity count (includes soft-deleted)
parallax_relationships_totalGaugeCurrent relationship count
parallax_writes_totalCounterTotal committed write batches
parallax_reads_totalCounterTotal snapshot reads
parallax_uptime_secondsGaugeServer uptime in seconds

Future Metrics (v0.2)

Planned additions:

  • parallax_query_duration_seconds — histogram of query latencies
  • parallax_wal_bytes_total — WAL bytes written
  • parallax_segment_count — number of on-disk segments
  • parallax_sync_duration_seconds — histogram of sync durations per connector
  • parallax_sync_errors_total — count of sync errors by connector

Example

curl http://localhost:7700/metrics

Or with authentication:

curl -H 'Authorization: Bearer your-key' http://localhost:7700/metrics

Error Responses

All API errors follow a consistent JSON structure.

Error Response Format

{
  "error": "ErrorType",
  "message": "Human-readable description of what went wrong"
}

The X-Request-Id header is always present in error responses, making it easy to correlate errors with server logs.

Error Types

400 Bad Request

ErrorCause
ParseErrorPQL query has a syntax error
InvalidRequestRequest body is missing required fields or has invalid types
InvalidEntityClassEntity class is not in the known classes list
InvalidRelationshipVerbRelationship verb is not in the known verbs list
{
  "error": "ParseError",
  "message": "unexpected token 'WHERE' at position 12: use 'WITH' instead"
}

401 Unauthorized

{
  "error": "Unauthorized",
  "message": "missing or invalid API key"
}

404 Not Found

{
  "error": "NotFound",
  "message": "entity a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6 not found"
}

500 Internal Server Error

ErrorCause
StoreErrorStorage engine error (I/O failure, corruption)
DanglingRelationshipIngest batch contains a relationship referencing a non-existent entity
ExecutionErrorQuery execution failed (timeout, resource limit)
SyncErrorSync commit failed
{
  "error": "DanglingRelationship",
  "message": "relationship from host:web-01 RUNS service:ghost — entity not found in batch or graph"
}
{
  "error": "StoreError",
  "message": "I/O error writing to WAL: No space left on device"
}

Common Mistakes

"WHERE" instead of "WITH"

PQL uses WITH for property filters, not WHERE:

# Wrong
FIND host WHERE state = 'running'

# Correct
FIND host WITH state = 'running'

Double quotes in string literals

PQL uses single quotes, not double quotes:

# Wrong
FIND host WITH state = "running"

# Correct
FIND host WITH state = 'running'

Dangling relationship in ingest

The from_key and to_key in relationships must reference entities that exist in the same batch or the current graph:

{
  "entities": [
    {"entity_type": "host", "entity_key": "web-01", "entity_class": "Host"}
  ],
  "relationships": [
    {
      "from_type": "host", "from_key": "web-01",
      "verb": "RUNS",
      "to_type": "service", "to_key": "ghost-service"
    }
  ]
}

This returns a DanglingRelationship error because service:ghost-service doesn't exist. Either include the service entity in the batch or remove the relationship.

Logging

Server-side errors are logged with the request ID and full error context:

ERROR parallax_server::routes: query execution failed
  request_id = "550e8400-e29b-41d4-a716-446655440000"
  error = "ParseError: unexpected token '=' at position 18"
  pql = "FIND host WITH state = = 'running'"

Use the X-Request-Id from the response to search server logs.

Policy Engine Overview

parallax-policy evaluates security policy rules against the live graph. Policies are PQL-powered rules that identify compliance violations, security gaps, and posture issues.

What It Does

  1. Load rules: Accept PolicyRule definitions with PQL queries
  2. Validate queries: Reject rules whose PQL is invalid at load time (INV-P01)
  3. Evaluate: Run all rules against the current graph snapshot
  4. Posture scoring: Compute per-control status and an overall security posture score

What It Is Not

  • Not a mutation engine. Policy evaluation is read-only (INV-P02).
  • Not a real-time alerting system. It evaluates on-demand or on schedule.
  • Not a SIEM. It doesn't process event streams.

Core Types

#![allow(unused)]
fn main() {
pub struct PolicyRule {
    /// Unique rule identifier
    pub id: String,

    /// Human-readable description
    pub title: String,

    /// PQL query that finds violating entities
    /// (empty result = compliant; any result = violation)
    pub query: String,

    /// Severity of the violation
    pub severity: Severity,

    /// Framework controls this rule maps to
    pub framework_mapping: Vec<FrameworkMapping>,

    /// Evaluation schedule (future: cron expression)
    pub schedule: Option<String>,
}

pub enum Severity {
    Critical,
    High,
    Medium,
    Low,
    Info,
}

pub struct FrameworkMapping {
    pub framework: String,    // "CIS", "NIST", "PCI-DSS", "SOC2"
    pub control_id: String,   // "CIS-1.1", "NIST-AC-2", etc.
}
}

Quick Example

#![allow(unused)]
fn main() {
use parallax_policy::{PolicyEvaluator, PolicyRule, Severity};

let rules = vec![
    PolicyRule {
        id: "no-unprotected-hosts".to_string(),
        title: "All hosts must have EDR protection".to_string(),
        query: "FIND host THAT !PROTECTS edr_agent".to_string(),
        severity: Severity::High,
        framework_mapping: vec![
            FrameworkMapping {
                framework: "CIS".to_string(),
                control_id: "CIS-7.1".to_string(),
            }
        ],
        schedule: None,
    },
];

let evaluator = PolicyEvaluator::new(rules)?;

let snap = engine.snapshot();
let results = evaluator.evaluate_all(&snap)?;

for result in &results {
    println!("{}: {} violations ({})",
        result.rule_id, result.violations.len(), result.severity);
}

let posture = evaluator.compute_posture(&snap)?;
println!("Overall posture score: {:.1}%", posture.overall_score * 100.0);
}

See Policy Rules, Evaluation, and Posture Scoring for complete documentation.

Policy Rules

A policy rule is a named PQL query that finds security violations. An empty result means compliant; any results mean violation.

PolicyRule Structure

#![allow(unused)]
fn main() {
pub struct PolicyRule {
    /// Unique identifier (e.g. `edr-coverage-001`)
    pub id: String,
    /// Human-readable name
    pub name: String,
    pub severity: Severity,
    /// Long-form description
    pub description: String,
    /// The PQL query that finds violations (0 results = PASS, >0 = FAIL)
    pub query: String,
    /// Compliance framework mappings
    pub frameworks: Vec<FrameworkMapping>,
    /// Evaluation schedule
    pub schedule: Schedule,
    /// Remediation guidance (markdown)
    pub remediation: String,
    pub enabled: bool,
}
}

Severity Levels

#![allow(unused)]
fn main() {
pub enum Severity {
    Critical,  // Immediate action required
    High,      // Resolve within 24 hours
    Medium,    // Resolve within 7 days
    Low,       // Track and resolve at next sprint
    Info,      // Informational, no SLA
}
}

Severities are serialised as lowercase strings in YAML/JSON: "critical", "high", "medium", "low", "info".

Schedule

#![allow(unused)]
fn main() {
pub enum Schedule {
    Manual,               // Only on explicit request
    Every(Duration),      // Periodic: "every:5m", "every:2h", "every:1d"
    OnSync(Vec<String>),  // After named connectors sync: "on_sync:aws,gcp"
}
}

Framework Mapping

#![allow(unused)]
fn main() {
pub struct FrameworkMapping {
    pub framework: String,  // e.g. "CIS-Controls-v8", "NIST-CSF", "SOC2"
    pub control: String,    // e.g. "10.1", "DE.CM-4"
}
}

YAML Rule Files

Rules are defined in YAML and loaded with load_rules_from_yaml(path):

rules:
  - id: edr-coverage-001
    name: EDR coverage gap
    severity: high
    description: Hosts without EDR protection.
    query: "FIND host THAT !PROTECTS edr_agent"
    frameworks:
      - framework: CIS-Controls-v8
        control: "10.1"
    schedule: "manual"          # or "every:5m", "on_sync:aws,gcp"
    remediation: "Deploy EDR agent to all hosts."
    enabled: true

  - id: mfa-enforcement-001
    name: MFA not enforced
    severity: critical
    description: Active users without MFA.
    query: "FIND user WITH mfa_enabled = false AND active = true"
    frameworks:
      - framework: NIST-CSF
        control: "PR.AC-7"
      - framework: CIS-Controls-v8
        control: "6.5"
    schedule: "every:1h"
    remediation: "Enable MFA for all active user accounts."
    enabled: true
#![allow(unused)]
fn main() {
use parallax_policy::load_rules_from_yaml;

let rules = load_rules_from_yaml(Path::new("rules/security.yaml"))?;
}

INV-P06: Validation at Load Time

Policy rules with invalid PQL are rejected when loaded into the evaluator, not at evaluation time:

#![allow(unused)]
fn main() {
let result = PolicyEvaluator::load(vec![
    PolicyRule::new("bad", "Invalid PQL", "FIND host WHERE state = 'running'"),
], &stats);
// Err(PolicyError::InvalidQuery { rule_id: "bad", parse_error: "..." })
}

This prevents silent failures where a broken rule never catches violations.

Query Design Patterns

PatternQuery
Find X that lack YFIND X THAT !VERB Y
Find X with bad propertyFIND X WITH property = 'bad_value'
Find X in bad state OR another bad stateFIND X WITH state = 'a' OR state = 'b'
Find X connected to dangerous YFIND X THAT VERB Y WITH dangerous = true
Count of X (threshold check)FIND X RETURN COUNT
Multi-hop risk pathFIND X THAT V1 Y THAT V2 Z WITH prop = val

Example Rules

EDR Coverage

- id: edr-coverage-001
  name: EDR coverage gap
  severity: high
  query: "FIND host THAT !PROTECTS edr_agent"
  frameworks:
    - framework: CIS-Controls-v8
      control: "10.1"
  schedule: "on_sync:aws"
  remediation: "Deploy EDR agent."
  enabled: true

MFA Enforcement

- id: mfa-all-users
  name: MFA not enforced
  severity: critical
  query: "FIND user WITH mfa_enabled = false AND active = true"
  frameworks:
    - framework: NIST-CSF
      control: "PR.AC-7"
  schedule: "every:1h"
  remediation: "Enable MFA for all active users."
  enabled: true

Public Cloud Storage

- id: no-public-buckets
  name: Public S3 buckets
  severity: critical
  query: "FIND aws_s3_bucket WITH public = true"
  frameworks:
    - framework: CIS-Controls-v8
      control: "3.3"
  schedule: "on_sync:aws"
  remediation: "Disable public access on all S3 buckets."
  enabled: true

Policy Evaluation

Loading an Evaluator

#![allow(unused)]
fn main() {
use parallax_policy::{PolicyEvaluator, load_rules_from_yaml};
use parallax_query::IndexStats;

let rules = load_rules_from_yaml(Path::new("rules/security.yaml"))?;
let stats = IndexStats::new(type_counts, class_counts, total, rel_total);
let evaluator = PolicyEvaluator::load(rules, &stats)?;
// Err if any rule contains invalid PQL (INV-P06)
}

Running an Evaluation

Sequential

#![allow(unused)]
fn main() {
let snap = engine.snapshot();
let graph = GraphReader::new(&snap);
let results = evaluator.evaluate_all(&graph, QueryLimits::default());
}

Parallel (3E)

#![allow(unused)]
fn main() {
// Each rule runs on its own OS thread; results collected in definition order.
let results = evaluator.par_evaluate_all(&graph, QueryLimits::default());
}

Both methods return identical results in the same order. par_evaluate_all is preferred when evaluating many rules — it uses std::thread::scope so non-'static lifetimes (including GraphReader<'snap>) work correctly.

RuleResult

#![allow(unused)]
fn main() {
pub struct RuleResult {
    pub rule_id: String,
    pub status: RuleStatus,
    pub violations: Vec<Violation>,
    pub error: Option<String>,      // set on RuleStatus::Error
    pub evaluated_at: Timestamp,
    pub duration: Duration,
}

pub enum RuleStatus {
    Pass,     // query returned 0 results
    Fail,     // query returned ≥1 results
    Error,    // rule evaluation errored (INV-P03: others still run)
    Skipped,  // rule.enabled = false
}

pub struct Violation {
    pub entity_id: EntityId,
    pub entity_type: EntityType,
    pub display_name: CompactString,
    pub details: String,
}
}

INV-P01: Snapshot Atomicity

All rules in one evaluate_all / par_evaluate_all call read the same snapshot. Rules see a consistent point-in-time view of the graph even if new data is being ingested concurrently.

INV-P02: Read-Only

Policy evaluation never modifies the graph.

INV-P03: Error Isolation

A rule that errors during evaluation is recorded with RuleStatus::Error and does not abort evaluation of other rules:

#![allow(unused)]
fn main() {
for result in &results {
    match result.status {
        RuleStatus::Pass    => println!("✓ {} — PASS", result.rule_id),
        RuleStatus::Fail    => println!("✗ {} — {} violations",
                                   result.rule_id, result.violations.len()),
        RuleStatus::Error   => println!("! {} — {}", result.rule_id,
                                   result.error.as_deref().unwrap_or("?")),
        RuleStatus::Skipped => println!("- {} — skipped", result.rule_id),
    }
}
}

Performance

Each rule runs one PQL query. For N rules, evaluate_all makes N sequential graph reads; par_evaluate_all runs all N concurrently on separate threads.

Typical throughput (100 rules, 100K entities):

  • Sequential: ~1–5 seconds (depends on rule complexity and graph structure)
  • Parallel: ~100–500ms (bounded by the slowest rule)

Posture Scoring

Posture scoring aggregates rule evaluation results into a per-framework compliance score and an overall posture score.

Computing Posture

#![allow(unused)]
fn main() {
let posture = evaluator.compute_posture(&snap)?;

println!("Overall: {:.1}%", posture.overall_score * 100.0);
for (framework, score) in &posture.framework_scores {
    println!("{}: {:.1}%", framework, score.score * 100.0);
}
}

FrameworkPosture

#![allow(unused)]
fn main() {
pub struct FrameworkPosture {
    /// Overall score: 0.0 (no rules passing) to 1.0 (all rules passing)
    pub overall_score: f64,

    /// Per-framework scores
    pub framework_scores: HashMap<String, ControlStatus>,

    /// Breakdown by severity
    pub by_severity: HashMap<Severity, SeverityBreakdown>,

    /// All rule results (pass, fail, error)
    pub results: Vec<RuleEvaluationResult>,
}

pub struct ControlStatus {
    pub framework: String,
    pub score: f64,            // 0.0 to 1.0
    pub passing_controls: u32,
    pub failing_controls: u32,
    pub error_controls: u32,
}

pub struct SeverityBreakdown {
    pub passing: u32,
    pub failing: u32,
    pub errored: u32,
}
}

Scoring Algorithm

Overall score:

score = passing_rules / total_rules

Where:

  • passing_rules = rules with EvaluationStatus::Pass
  • total_rules = all rules (pass + fail + error)

INV-P04: Errored rules count as failures for posture scoring. A rule that can't evaluate is treated as if it found violations.

Per-framework score:

For each framework (CIS, NIST, PCI-DSS, etc.), compute the ratio of passing controls to total controls mapped to that framework:

framework_score = controls_passing / controls_mapped_to_framework

If a rule maps to multiple frameworks, it contributes to each framework's score independently.

Example Report

Security Posture Report
=======================
Overall Score: 73.3% (11/15 rules passing)

By Severity:
  Critical: 2 passing, 2 failing    (50.0%)
  High:     4 passing, 1 failing    (80.0%)
  Medium:   5 passing, 0 failing   (100.0%)
  Low:      0 passing, 1 failing    (0.0%)
  Info:     0 passing, 0 failing      (N/A)

By Framework:
  CIS:     8/10 controls passing   (80.0%)
  NIST:    6/8  controls passing   (75.0%)
  PCI-DSS: 4/7  controls passing   (57.1%)

Failing Rules:
  ✗ [CRITICAL] All users must have MFA enabled
    → 47 users without MFA
  ✗ [CRITICAL] No S3 buckets should be publicly accessible
    → 3 public buckets: logs-bucket, public-assets, backup-2023
  ✗ [HIGH] All hosts must have EDR protection
    → 12 unprotected hosts
  ✗ [LOW] Minimize admin role assignments
    → 23 admin assignments (threshold: 10)

Framework Context

FrameworkFocus AreaCommon Controls
CISTechnical security benchmarksEDR coverage, patch levels, network segmentation
NISTBroad cybersecurity frameworkAccess control, identity, protect/detect/respond
PCI-DSSPayment card data securityNetwork isolation, encryption, access logging
SOC 2Service organization controlsAvailability, confidentiality, integrity
HIPAAHealthcare data protectionPHI access, audit trails, encryption

The framework mapping in PolicyRule is metadata — it doesn't change how the rule is evaluated, only how it's categorized in the posture report.

Known Entity Classes

Entity classes are a closed set defined by Parallax. An entity submitted with an unknown class is rejected at ingest time.

The class is the broad category that enables cross-type queries: FIND Host matches EC2 instances, Azure VMs, containers, and any other entity whose class is Host.

Full Class List (41 classes)

ClassDescriptionExample Types
HostCompute hosts — servers, VMs, containersaws_ec2_instance, azure_vm, host
UserHuman or service user accountsokta_user, aws_iam_user, user
DataStoreStorage systemsaws_s3_bucket, database, datastore
CodeRepoSource code repositoriesgithub_repo, gitlab_project
FirewallNetwork access controlaws_security_group, firewall
AccessPolicyAuthorization policiesaws_iam_policy, access_policy
NetworkSegmentNetwork segments/subnetsaws_vpc, aws_subnet, network
ServiceRunning services or processesservice, microservice
CertificateTLS/SSL certificatescertificate, tls_cert
SecretSecrets and tokenssecret, aws_secret, vault_secret
CredentialCredentials and API keyscredential, api_key
KeyEncryption keysaws_kms_key, key
ContainerContainer instancesdocker_container, container
PodKubernetes podsk8s_pod, pod
ClusterKubernetes clustersk8s_cluster, eks_cluster
NamespaceKubernetes namespacesk8s_namespace, namespace
FunctionServerless functionsaws_lambda, function
QueueMessage queuesaws_sqs_queue, queue
TopicMessage topicsaws_sns_topic, topic
DatabaseDatabase instancesaws_rds_instance, postgres_db
ApplicationApplications or servicesapplication, web_app
PackageSoftware packagesnpm_package, python_package
VulnerabilitySecurity vulnerabilitiescve, vulnerability
IdentityIdentity providers/identitiesidentity, saml_identity
ProcessRunning processesprocess, daemon
FileFiles and filesystemsfile, s3_object
RegistryContainer/package registriesecr_repo, docker_registry
PolicyGeneric policiespolicy, network_policy
AccountCloud or service accountsaws_account, gcp_project
OrganizationOrganizations or tenantsorganization, company
TeamTeams or groupsteam, department
RoleRoles or job functionsaws_iam_role, okta_group
GroupGroups of entitiesgroup, ad_group
DevicePhysical or virtual devicesdevice, workstation
EndpointNetwork endpointsendpoint, api_endpoint
ScannerSecurity scannersscanner, qualys_scanner
AgentSecurity agentsagent, edr_agent
SensorTelemetry sensorssensor, network_tap
TicketTickets and issuesjira_issue, ticket
EventSecurity eventsevent, alert
GenericCatch-all for unlisted typesgeneric

Using Classes in PQL

-- All hosts (regardless of type: EC2, Azure VM, container, etc.)
FIND Host

-- All datastores not accessible publicly
FIND DataStore WITH public = false

-- All users without MFA
FIND User WITH mfa_enabled = false

-- Hosts with no EDR agent protecting them
FIND Host THAT !PROTECTS Agent

Requesting a New Class

The class list is curated and intentionally kept small (~40 values). Before requesting a new class, consider:

  1. Can it be modeled with an existing class? (e.g., use Generic for truly novel entity types)
  2. Is it used by multiple connectors, or just one? (connector-specific types should use an entity type, not a class)
  3. Would it enable useful cross-type queries that aren't possible today?

Open an issue on GitHub to propose new classes. New classes require a spec update and a minor version bump.

In Code

#![allow(unused)]
fn main() {
use parallax_core::entity::KNOWN_CLASSES;

// Validate a class string
if KNOWN_CLASSES.contains(&"Host") {
    let class = EntityClass::new("Host").unwrap();
}

// Get a &[&str] of all known classes
println!("Known classes: {:?}", KNOWN_CLASSES);
}

Known Relationship Verbs

Relationship verbs are a closed set of 15 values. A relationship with an unknown verb is rejected at ingest time (in v0.1: warning; in v0.2: hard error).

The verb is the semantic label on a directed edge. A closed, curated set ensures that queries like FIND * THAT ALLOWS internet have consistent semantics across all connectors.

Full Verb List

VerbDirectionSemantic MeaningExample
HASA → BOwnership or containmentaws_account HAS aws_s3_bucket
ISA ↔ BIdentity or equivalenceokta_user IS person
ASSIGNEDA → BRole or permission assignmentuser ASSIGNED role
ALLOWSA → BGrants network or access permissionsecurity_group ALLOWS internet
USESA → BActive dependencyservice USES database
CONTAINSA → BLogical grouping (strong containment)aws_vpc CONTAINS aws_subnet
MANAGESA → BAdministrative controlteam MANAGES github_repo
CONNECTSA ↔ BNetwork-level connectivityaws_vpc CONNECTS aws_vpc
PROTECTSA → BSecurity control coverageedr_agent PROTECTS host
EXPLOITSA → BVulnerability exploitationcve EXPLOITS software_package
TRUSTSA → BTrust relationshipaws_account TRUSTS aws_account
SCANSA → BScanner coveragequalys_scanner SCANS host
RUNSA → BProcess or service executionhost RUNS service
READSA → BData access (read)application READS database
WRITESA → BData access (write)application WRITES database

Using Verbs in PQL

-- Direct verb queries
FIND host THAT RUNS service
FIND user THAT ASSIGNED role
FIND security_group THAT ALLOWS internet

-- Negated (coverage gap)
FIND host THAT !PROTECTS edr_agent
FIND service THAT !SCANS scanner

-- Multi-hop
FIND user THAT ASSIGNED role THAT ALLOWS aws_s3_bucket
FIND cve THAT EXPLOITS package THAT USES service THAT RUNS host

Verb Semantics in Blast Radius

For blast radius analysis, these verbs are considered attack-relevant by default:

RUNS, CONNECTS, TRUSTS, CONTAINS, HAS, USES, EXPLOITS

These cover the most common lateral movement patterns:

  • RUNS: compromise a host → compromise its services
  • CONNECTS: network path between hosts
  • TRUSTS: cross-account / cross-system trust
  • CONTAINS: moving from outer to inner containers
  • HAS: ownership chain traversal
  • USES: dependency exploitation
  • EXPLOITS: CVE to affected system

Verb Selection Guide

SituationRecommended Verb
Cloud resource ownershipHAS
IAM/RBAC assignmentASSIGNED
Network access rulesALLOWS
Service-to-databaseUSES or READS/WRITES
Host-to-serviceRUNS
VPC peeringCONNECTS
Scanner-to-targetSCANS
EDR-to-hostPROTECTS
CVE-to-packageEXPLOITS
Organizational groupingCONTAINS
Logical equivalenceIS

In Code

#![allow(unused)]
fn main() {
use parallax_core::relationship::KNOWN_VERBS;

// Validate a verb string
if KNOWN_VERBS.contains(&"RUNS") {
    let verb = RelationshipClass::new("RUNS").unwrap();
}

// Get all known verbs
println!("Known verbs: {:?}", KNOWN_VERBS);
}

Proposing a New Verb

New verbs require a spec change and community discussion. The bar is high: a new verb must:

  1. Be semantically distinct from all existing verbs
  2. Be used by at least 3 different connector types
  3. Enable new query patterns not possible with existing verbs

Open an issue on GitHub to propose new verbs.

Value Types

Parallax properties use a flat, fixed set of value types.

The Value Enum

#![allow(unused)]
fn main() {
pub enum Value {
    Null,
    Bool(bool),
    Int(i64),
    Float(f64),
    String(CompactString),
    Timestamp(Timestamp),
    StringArray(Vec<CompactString>),
}
}

Type Details

Null

Represents the absence of a value. Use Null for optional properties that are not set.

#![allow(unused)]
fn main() {
// Rust
entity.properties.insert("terminated_at".into(), Value::Null);

// REST API JSON
"properties": { "terminated_at": null }

// PQL filter
FIND host WITH terminated_at = null
}

Bool

#![allow(unused)]
fn main() {
// Rust
Value::from(true)
Value::Bool(false)

// REST API JSON
"properties": { "active": true, "mfa_enabled": false }

// PQL filter
FIND user WITH active = true
FIND user WITH mfa_enabled = false
}

Int (i64)

Integer values in the range [-2^63, 2^63 - 1].

#![allow(unused)]
fn main() {
// Rust
Value::from(443i64)
Value::Int(8080)

// REST API JSON — must be a JSON integer
"properties": { "port": 443, "cpu_count": 8 }

// PQL filter
FIND host WITH cpu_count > 4
FIND service WITH port = 443
}

Float (f64)

Double-precision floating point.

#![allow(unused)]
fn main() {
// Rust
Value::from(9.8f64)
Value::Float(0.5)

// REST API JSON
"properties": { "score": 9.8, "utilization": 0.75 }

// PQL filter
FIND host WITH cpu_utilization > 0.8
}

String

Short-to-medium strings, backed by CompactString (stack-allocated for strings ≤24 bytes; heap-allocated for longer strings).

#![allow(unused)]
fn main() {
// Rust
Value::from("running")
Value::String(CompactString::new("us-east-1"))

// REST API JSON
"properties": { "state": "running", "region": "us-east-1" }

// PQL filter (single quotes only)
FIND host WITH state = 'running'
FIND user WITH email LIKE '%@corp.com'
}

Timestamp

Hybrid Logical Clock timestamp. Primarily used for audit fields like created_at, updated_at, sync_timestamp.

#![allow(unused)]
fn main() {
// Rust
Value::Timestamp(Timestamp::now())

// REST API JSON — ISO 8601 string
"properties": { "last_seen": "2024-01-15T10:30:00Z" }
}

Not directly filterable in PQL v0.1. Use string properties for time-based filtering in the current version.

StringArray

A flat array of strings. Common for tags, labels, and group memberships. No nesting within arrays (no arrays of objects).

#![allow(unused)]
fn main() {
// Rust
Value::StringArray(vec!["web".into(), "production".into(), "us-east-1".into()])

// REST API JSON
"properties": { "tags": ["web", "production", "us-east-1"] }
}

StringArray is not filterable in PQL v0.1. Filter via scalar properties.

Type Stability (INV-07/08)

INV-07: Property types must be stable within an entity type. If port is an Int for aws_security_group_rule, it must always be Int for that type. A connector that sends it as a String gets a warning in v0.1 and a hard error in v0.2.

INV-08: Properties are flat — no nested objects or arrays-of-objects.

JSON Type Mapping

JSON TypeParallax Value
nullValue::Null
true / falseValue::Bool
Integer (no decimal)Value::Int
Number (with decimal)Value::Float
StringValue::String
Array of stringsValue::StringArray
ObjectRejected — no nested objects
Array of objectsRejected — no nested arrays

Configuration

Server Configuration

Configuration via environment variables (recommended) or CLI flags.

Env VarCLI FlagDefaultDescription
PARALLAX_HOST--host127.0.0.1Bind address
PARALLAX_PORT--port7700HTTP port
PARALLAX_DATA_DIR--data-dir./parallax-dataData directory
PARALLAX_API_KEY(empty)API key (empty = open mode)
RUST_LOGinfoLog level (trace/debug/info/warn/error)

Storage Engine Configuration

StoreConfig controls engine behavior. Set via code when embedding, or via future config file support (v0.2).

#![allow(unused)]
fn main() {
pub struct StoreConfig {
    /// Root directory for WAL, segments, and index files.
    pub data_dir: PathBuf,

    /// Flush MemTable to segment when in-memory size exceeds this threshold.
    /// Default: 64MB. Larger = fewer flushes, more memory usage.
    pub memtable_flush_size: usize,

    /// Maximum WAL segment file size before rotation.
    /// Default: 64MB. After rotation, a new wal-{n}.pxw file starts.
    pub wal_segment_max_size: u64,
}
}

Data Directory Layout

After starting the server, the data directory has this structure:

parallax-data/
├── wal/
│   ├── wal-00000000.pxw    ← WAL segments (binary, PXWA format)
│   ├── wal-00000001.pxw
│   └── ...
└── segments/
    ├── seg-00000000.pxs    ← Immutable segment files (binary, PXSG format)
    ├── seg-00000001.pxs
    └── ...

Both WAL and segment files are binary and not human-readable. Use parallax stats to inspect the current state.

Query Limits

Default limits applied to all queries:

LimitDefaultDescription
max_results10,000Maximum entities returned per query
timeout30sQuery execution timeout
max_traversal_depth10Maximum traversal hops

These are currently hardcoded. Per-query overrides and global configuration are planned for v0.2.

Logging

Parallax uses tracing for structured logging. Configure with RUST_LOG:

# Common patterns
RUST_LOG=info parallax serve           # Default
RUST_LOG=debug parallax serve          # Verbose
RUST_LOG=parallax_store=trace parallax serve  # Trace storage only
RUST_LOG=parallax=debug,tower_http=warn parallax serve  # App debug, HTTP quiet

Log format (JSON) is planned for v0.2. Current format is human-readable text.

Production Recommendations

# Full production configuration
export PARALLAX_HOST=0.0.0.0
export PARALLAX_PORT=7700
export PARALLAX_DATA_DIR=/var/lib/parallax
export PARALLAX_API_KEY=$(openssl rand -hex 32)
export RUST_LOG=info

parallax serve

Storage:

  • Mount PARALLAX_DATA_DIR on a fast SSD
  • Ensure at least 10GB free space for WAL and segments
  • Set up log rotation for WAL files (automatic in v0.2)

Security:

  • Generate a cryptographically random API key
  • Terminate TLS at the load balancer or reverse proxy
  • Restrict network access to the parallax port
  • Never log the PARALLAX_API_KEY

Monitoring:

  • Scrape /metrics with Prometheus
  • Alert on parallax_errors_total increases
  • Monitor parallax_entities_total for unexpected spikes or drops

CLI Reference

The parallax CLI provides commands for serving, querying, and inspecting the graph engine.

Installation

# From source
cargo install --path crates/parallax-cli

# From release binary (when available)
curl -LO https://releases.parallax.rs/v0.1.0/parallax-linux-amd64
chmod +x parallax-linux-amd64
mv parallax-linux-amd64 /usr/local/bin/parallax

Global Flags

parallax [--log-format FORMAT] <COMMAND>

Global options:
  --log-format <FORMAT>   Log format: "text" (default) or "json"

--log-format json emits structured JSON logs compatible with log aggregators (Datadog, Splunk, CloudWatch Logs).

Commands

parallax serve

Start the REST HTTP server.

parallax serve [OPTIONS]

Options:
  --host <HOST>       Bind address [default: 127.0.0.1]
  --port <PORT>       Port number [default: 7700]
  --data-dir <PATH>   Data directory [default: ./parallax-data]

Environment:
  PARALLAX_API_KEY    API key for authentication (empty = open mode)

Examples:

# Development (open mode)
parallax serve --data-dir ./data

# Production (with auth, all interfaces)
PARALLAX_API_KEY=$(openssl rand -hex 32) \
parallax serve --host 0.0.0.0 --port 7700 --data-dir /var/lib/parallax

parallax query

Execute a PQL query against a local data directory (in-process, no server).

parallax query <PQL> [OPTIONS]

Arguments:
  <PQL>               The PQL query string (quote it)

Options:
  --data-dir <PATH>   Data directory [default: ./parallax-data]
  --limit <N>         Limit results [default: 100]

Examples:

# Find all running hosts
parallax query "FIND host WITH state = 'running'"

# Count all hosts
parallax query "FIND host RETURN COUNT"

# Group by OS
parallax query "FIND host GROUP BY os"

# With custom data dir
parallax query "FIND host" --data-dir /var/lib/parallax

Example output:

Results: 2
  [host] Web Server 1  (id: a1b2c3d4...)
  [host] Web Server 2  (id: e5f6a7b8...)

parallax stats

Display entity and relationship counts.

parallax stats [OPTIONS]

Options:
  --data-dir <PATH>   Data directory [default: ./parallax-data]
  --json              Output as JSON

Example output:

Parallax Graph Statistics
=========================
Total entities:      12,543
Total relationships: 45,210

By type:
  host              1,200
  user              5,400
  service           2,100
  aws_s3_bucket       450
  aws_iam_role        380
  (36 more types...)

By class:
  User              5,400
  Host              1,200
  Service           2,100
  DataStore           450

parallax wal dump

Inspect the Write-Ahead Log for debugging and forensics.

parallax wal dump [OPTIONS]

Options:
  --data-dir <PATH>   Data directory [default: ./parallax-data]
  --verbose           Show individual operation details (default: summary only)

Example output (summary):

WAL dump — data_dir: ./parallax-data
       seq        ops  segment
  --------------------------------------------------
         1         50  wal-00000001.pxw
         2        120  wal-00000002.pxw
         3         30  wal-00000003.pxw

  Total: 3 batches, 200 ops

With --verbose (shows each entity/relationship operation):

         1         50  wal-00000001.pxw
    + entity  [host] web-01  (id: a1b2c3...)
    + entity  [host] web-02  (id: d4e5f6...)
    + rel     [RUNS] a1b2c3... → 789abc...
    - entity  id=deadbeef...

parallax version

Print version information.

parallax version

Output:

parallax 0.1.0

Exit Codes

CodeMeaning
0Success
1General error (engine open failure, query error)
2Invalid arguments

Logging

Set RUST_LOG to control log verbosity, and --log-format for the output format:

# Debug all parallax modules (human-readable)
RUST_LOG=parallax=debug parallax serve

# Only warnings and errors
RUST_LOG=warn parallax serve

# JSON structured logs (for log aggregators)
parallax --log-format json serve

# JSON + custom verbosity
RUST_LOG=info parallax --log-format json serve

Performance Targets

v0.1 Targets

These are the benchmarked targets for v0.1 on modern NVMe hardware:

OperationTarget p99Notes
Entity lookup by ID (MemTable)≤1μsBTreeMap::get
Entity lookup by ID (Segment)≤100μsLinear scan; improves with index in v0.2
Type index scan (1K entities)≤500μsIterate type index, fetch entities
Single-hop traversal (degree ≤100)≤500μsAdjacency index lookup
Multi-hop traversal (depth 3, degree 5)≤5msBFS with visited set
Shortest path (10K-node graph)≤10msBidirectional BFS
Blast radius (depth 4, 1K impacted)≤10msBFS from origin
WAL write throughput≥500K ops/secBatched, single fsync
PQL parse + plan≤1msHand-written recursive descent
Snapshot acquisition≤1μsArc::clone
Query execution (count, 100K entities)≤50msType index + filter

Benchmarking

Run the benchmark suite:

# Storage engine benchmarks
cargo bench --package parallax-store

# Graph engine benchmarks
cargo bench --package parallax-graph

# Query engine benchmarks
cargo bench --package parallax-query

What Affects Performance

MemTable vs. Segment Reads

Entities in the MemTable (recently written) are served in ≤1μs. Entities that have been flushed to segment files require a linear scan of the segment, adding ~100μs per entity. The segment index (v0.2) will reduce this to ~1μs.

Implication: If your workload has mostly recent data (e.g., fresh ingest followed by queries), performance is excellent. If your graph is heavily segmented, upgrade to v0.2 for segment indexing.

Traversal Depth and Degree

Traversal complexity is O(b^d) where b is the average branching factor and d is the depth. With BFS visited-set deduplication:

  • Depth 2, degree 10: 100 entities visited
  • Depth 4, degree 10: 10,000 entities visited
  • Depth 4, degree 100: 100,000,000 entities — exceeds practical limits

Use max_depth to bound traversals. Set edge_classes to filter edge types and reduce the branching factor.

Lock Contention

The Arc<Mutex<StorageEngine>> lock is held for:

  • ~5μs per entity during diff computation
  • ~10μs for WAL fsync + MemTable update

Multiple concurrent syncs queue at the write step but diff in parallel. For more than 10 concurrent connectors syncing simultaneously, expect write latency to increase. WAL group commit (v0.2) will amortize this.

Memory Consumption

Rule of thumb: ~1KB per entity in MemTable (struct + property strings on heap). For 100K entities, expect ~100MB MemTable size before flush.

The flush threshold (default: 64MB) triggers a segment flush, freeing MemTable memory. Tune memtable_flush_size based on your available memory.

Scaling to 1M+ Entities

v0.1 is benchmarked and correct at 100K entities. For 1M+ entities:

  1. Segment indexing (v0.2): reduces lookup from O(n_segment) to O(log n)
  2. Compaction (v0.2): reclaims space from deleted entities
  3. WAL group commit (v0.2): 10-100× write throughput improvement

The architecture supports 10M+ entities with the v0.2 improvements. The single-writer model remains correct at any scale — it simply processes writes faster with group commit.

Roadmap

v0.1 delivered the complete vertical slice: ingest → store → graph → query → serve. v0.2 focused on performance, operability, and the policy engine.

v0.2 — Completed

Performance

FeatureStatusNotes
WAL group commit✅ Doneappend_batch() — single fsync per batch
Segment sparse index✅ DoneBinary-search index, O(log n) lookups
Background compaction✅ DoneCompactionWorker thread, merge small segments
Property indexDeferredSecondary index deferred to v0.3

Query Language

FeatureStatusNotes
OR in filters✅ DoneWITH state = 'a' OR state = 'b'
NOT EXISTS✅ DoneWITH NOT owner EXISTS
GROUP BY✅ DoneFIND host GROUP BY osQueryResult::Grouped
Field projectionDeferredRETURN field1, field2 parses but returns full entities
Parameterized queriesDeferred

Connector SDK

FeatureStatusNotes
Parallel step execution✅ Donetopological_waves() + JoinSet per wave
WASM connector sandboxDeferred
Connector config schemaDeferred

Policy Engine

FeatureStatusNotes
YAML rule files✅ Doneload_rules_from_yaml(), serde_yaml
Policy REST API✅ DoneGET/POST /v1/policies, POST /v1/policies/evaluate, GET /v1/policies/posture
Parallel evaluation✅ Donepar_evaluate_all() via std::thread::scope
Scheduled evaluationDeferred

Observability

FeatureStatusNotes
JSON log format✅ Done--log-format json global CLI flag
parallax wal dump✅ Doneparallax wal dump [--verbose]
Rich Prometheus metricsPartialBasic counters; histograms deferred
OpenTelemetry tracesDeferred

v0.3 — Planned

  • Property secondary index (fast WITH state=X without full scan)
  • Field projection (RETURN display_name, state)
  • Parameterized queries (FIND host WITH state = $1)
  • Scheduled policy evaluation (cron-based)
  • gRPC via tonic
  • First-party connectors: connector-aws, connector-okta, connector-github

Known Limitations

  1. Field projection parses but is not enforced: FIND host RETURN display_name parses without error but still returns full entity objects. Projection requires an architectural change to the entity return type.

  2. Single-node only: No replication, no clustering. The storage format is designed to support replication but it is not implemented.

  3. No gRPC: Only REST. gRPC is architecturally planned but not implemented.

  4. Soft class/verb enforcement: Unknown entity classes and relationship verbs produce warnings, not hard errors.

Known Deferred Items

  • crates/parallax-query/src/executor.rs — field projection (RETURN clause returns full entities, not projected fields)

Versioning Policy

  • v0.x: Breaking changes between minor versions are allowed.
  • v1.0: Stable public API. Breaking changes require a major version bump.
  • MSRV: Latest stable Rust minus 2 releases.

The PQL language syntax and the entity/relationship schema are treated as public API even before v1.0 — changes go through a deprecation cycle.