Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Architecture

Vajra is a Rust workspace of 17 crates. Each crate has a single responsibility. Dependencies flow downward. Nothing cycles.


The 17-Crate Workspace

vajra/
├── vajra-types/          Shared types, traits, contracts
├── vajra-core/           Parsing, traversal, canonicalization, path extraction
├── vajra-fingerprint/    BLAKE3 hashing, Merkle trees, MinHash, SimHash, LSH
├── vajra-stats/          CMS, Space-Saving, DDSketch, MAD, entropy, frequency
├── vajra-anomaly/        Outlier scoring, instability, rarity, structural anomaly
├── vajra-drift/          JSD, Wasserstein, path diff, drift classification
├── vajra-motif/          Motif counting, near-motif grouping, motif compression
├── vajra-essence/        Profiles, scoring, ranking, rendering, templates
├── vajra-query/          Expression parsing, path filtering, analysis functions
├── vajra-source/         Source code parsing via tree-sitter (Rust, Python, Go, JS, +5)
├── vajra-cli/            CLI argument parsing, command dispatch, output formatting
├── vajra-domain-med/     Medical/EDI type recognizers (ICD-10, CPT, NPI, NDC, HCPCS)
├── vajra-domain-sec/     Security type recognizers (CVE, MITRE ATT&CK, IPs, hashes, JWT)
├── vajra-domain-devops/  DevOps type recognizers (K8s, Docker, Terraform, ARN, semver)
├── vajra-domain-source/  Source code recognizers (naming conventions, import paths)
├── vajra-domain-encoding/ Encoding detection (Base64, hex, URL, PEM, layers)
└── Cargo.toml            Workspace root

Dependency Graph

                    vajra-types
                   /     |     \
                  /      |      \
           vajra-core    |    vajra-domain-{med,sec,devops}
            /    \       |       /
           /      \      |      /
  vajra-fingerprint  vajra-stats
       |          \   /   |
       |           \ /    |
       |      vajra-anomaly
       |           |
       |      vajra-drift
       |           |
       |      vajra-motif
       |         / |
       |        /  |
       vajra-essence
            |
       vajra-query
            |
       vajra-cli

Root crates (no internal dependencies):

  • vajra-types — shared types, trait definitions, result contracts
  • vajra-core depends only on vajra-types

Leaf crate (depends on everything):

  • vajra-cli — the binary. It orchestrates all other crates.

Crate Responsibilities

vajra-types

The foundation. Shared types that every crate depends on.

  • Document — the parsed document model (value tree + path trie + metadata)
  • WildcardPath — normalized path representation with [*] array indices
  • PathTrie — trie data structure for efficient path storage and lookup
  • FeatureStore — per-path feature vectors
  • JsonType — enum of JSON types (object, array, string, number, boolean, null)
  • Core traits: Analyzer, StreamAnalyzer, FeatureExtractor, ConcernProfile, Fingerprinter, DriftDetector
#![allow(unused)]
fn main() {
pub trait Analyzer {
    type Output;
    fn analyze(&self, doc: &Document) -> Result<Self::Output>;
}

pub trait StreamAnalyzer {
    type Accumulator: Default;
    type Output;
    fn on_event(&self, event: &JsonEvent, acc: &mut Self::Accumulator) -> Result<()>;
    fn finalize(&self, acc: Self::Accumulator) -> Result<Self::Output>;
}
}

vajra-core

Parsing, traversal, and the foundational index.

  • simd-json integration for DOM-mode parsing
  • Multi-format input support (JSON, NDJSON, YAML, CSV, TSV, Markdown, PDF)
  • Compression handling (gzip, zstd)
  • HTTP URL fetching
  • RFC 8785 canonicalization
  • DFS path extraction and path trie construction
  • Unicode NFC normalization
  • Redaction engine (vajra_core::redact)
  • Input hardening (depth limits, string length limits, size limits)

vajra-fingerprint

Structural identity.

  • BLAKE3 path set fingerprint
  • BLAKE3 typed path fingerprint
  • Merkle subtree hashing (shape fingerprint)
  • MinHash signature computation (k = 128)
  • SimHash for near-motif detection
  • LSH bucketing for scalable similarity search
  • Cluster computation from LSH candidates
  • StreamingFingerprintAccumulator for streaming mode

vajra-stats

The statistical engine.

  • Shannon entropy (exact and CMS-approximate)
  • Normalized entropy
  • Count-Min Sketch with conservative update
  • Space-Saving top-k
  • DDSketch for streaming quantiles
  • MAD and modified z-scores
  • Frequency analysis (key, path, value)
  • Missingness profiling (null rate, absent rate, empty rate)
  • Numeric distribution summary (min, max, mean, median, percentiles)
  • Co-occurrence and PMI computation
  • Benford’s Law leading digit analysis
  • StreamingStatsAccumulator for streaming mode

vajra-anomaly

Deviation detection.

  • Numeric outlier detection (MAD-based z-scores)
  • Rarity scoring (self-information)
  • Structural deviation detection (Jaccard distance from mode)
  • Type instability detection
  • Composite anomaly scoring
  • Anomaly report generation

vajra-drift

Change detection between documents.

  • Path set symmetric difference (structural drift)
  • Type drift detection
  • Jensen-Shannon Divergence for distributional drift
  • 1D Wasserstein distance for numeric drift magnitude
  • Drift classification (additive, subtractive, type-mutative, distributional, cardinality-shift, null-rate-shift)
  • Severity scoring with profile-dependent weights

vajra-motif

Repeated structure analysis.

  • Motif counting from Merkle subtree hash frequencies
  • Near-motif grouping via SimHash Hamming distance
  • Motif ranking (frequency x subtree size)
  • Motif compression for essence generation
  • Array morphology analysis (homogeneity, uniqueness, shape diversity)

vajra-essence

The rendering engine.

  • Built-in profiles: StaffProfile, EngineerProfile, AuditorProfile, AiProfile, FraudProfile
  • Custom profile loading from TOML
  • Six-dimensional scoring model
  • Candidate collection and ranking
  • Token budget enforcement (greedy knapsack)
  • Text, JSON, Markdown, and compact-AI renderers
  • Motif collapsing
  • --explain score decomposition
  • Provenance metadata attachment

vajra-query

Path-based query engine.

  • Expression parser for path filters and analysis functions
  • entropy(path), rarity(path, value), instability(path), null_rate(path), stats(path), anomaly_score(path), motif(path)
  • Conditional expression evaluation (e.g., entropy($.status) > 0.5)
  • Integration with stats, anomaly, and motif analyzers

vajra-cli

The command-line interface.

  • Clap-based argument parsing
  • Command dispatch (inspect, stats, anomalies, fingerprint, essence, drift, cluster, invariants, query, batch, profiles)
  • Output format rendering (text, JSON, Markdown, compact-AI)
  • Redaction integration
  • Streaming mode selection
  • Custom profile loading
  • Batch processing with Rayon parallelism

vajra-domain-med

The medical/EDI domain plugin.

  • ICD-10-CM and ICD-10-PCS pattern recognizers
  • CPT and HCPCS code recognizers
  • NDC (National Drug Code) recognizer
  • NPI (National Provider Identifier) recognizer with Luhn check
  • Denial reason code recognizer (CO, PR, OA, PI, CR)
  • Claim, service line, patient, provider, and adjudication relationship hints
  • Implements VajraPlugin trait

Core Traits

The trait system is the architectural backbone. Each trait is small, composable, and independently testable.

TraitDefined InPurpose
Analyzervajra-typesDOM-mode analysis: document in, typed output out
StreamAnalyzervajra-typesStreaming analysis: events in, accumulator maintained, output finalized
FeatureExtractorvajra-typesExtract features into the shared feature store
ConcernProfilevajra-typesDefine scoring weights and rendering behavior
Fingerprintervajra-typesCompute structural fingerprints
DriftDetectorvajra-typesCompare two analyzed documents for drift
VajraPluginvajra-typesPlugin extension point
TypeRecognizervajra-typesDomain-specific value type recognition

“I want to understand how parsing works.”
Start at vajra-core/src/. The input module handles multi-format loading. The parse module handles JSON parsing. The canon module handles canonicalization.

“I want to understand the statistical engine.”
Start at vajra-stats/src/. Each statistical primitive has its own module. StatsAnalyzer composes them.

“I want to add a new profile.”
Look at vajra-essence/src/. The built-in profiles (StaffProfile, EngineerProfile, etc.) implement ConcernProfile. Follow the pattern.

“I want to add a domain plugin.”
Look at vajra-domain-med/ as the reference implementation. Implement VajraPlugin in a new crate.

“I want to add a new command.”
Start at vajra-cli/src/main.rs. Each command is a function (cmd_inspect, cmd_stats, etc.). Add a new variant to the Command enum and implement the handler.

“I want to understand how essences are built.”
Start at vajra-essence/src/. The EssenceBuilder collects observations from stats, anomaly, and motif analyzers, scores them, and renders the result.


Build and Run

# Build the entire workspace
cargo build --release

# Run tests across all crates
cargo test --workspace

# Run the CLI
./target/release/vajra inspect claim.json

# Run benchmarks
cargo bench --workspace

External Dependencies

DependencyVersionPurpose
serde / serde_json1.xSerialization
serde_yaml0.9YAML input format
csv1.xCSV/TSV input format
blake31.xAll hashing
clap4.xCLI argument parsing
ryu1.xDeterministic float formatting
unicode-normalization0.1Unicode NFC normalization
toml0.8Config and profile loading
regex1.xPattern matching (redaction, type recognition)
rayon1.xParallel batch processing
thiserror / anyhow2.x / 1.xError handling
flate21.xGzip decompression
zstd0.13Zstd decompression
pulldown-cmark0.12Markdown input parsing
pdf-extract0.10PDF text extraction
ureq2.xHTTP URL fetching
proptest1.xProperty-based testing
criterion0.5Benchmarks

All dependencies are Rust-native. No C bindings, no FFI, no system library requirements beyond a standard Rust toolchain.


Lints

The workspace enforces strict Clippy lints:

[workspace.lints.clippy]
pedantic = { level = "warn", priority = -1 }
nursery = { level = "warn", priority = -1 }
unwrap_used = "deny"    # No .unwrap() — use Result
expect_used = "deny"    # No .expect() — use Result
panic = "deny"          # No panic!() — ever

No panics on any input. No unwraps. No expects. Every error path returns a Result.