Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

VAJRA

Deterministic Semantic Reduction Engine

Break noise. Preserve truth.
761 Tests
12 Crates
11 Commands
22K Lines of Rust
0 Failures

What Vajra Does

Feed it any structured data. Get back shape, signal, anomalies, and truth.

Vajra analyzes JSON, YAML, CSV, NDJSON, Markdown, and PDF. It extracts structural fingerprints, computes entropy and statistical profiles, detects anomalies and schema drift, discovers cross-field relationships, and renders deterministic essences tuned for humans, auditors, or AI pipelines.

Inspect

vajra inspect claim.json

Full structural analysis — paths, types, fingerprints, domain recognition.

Essence

vajra essence data.json --profile staff

Concern-oriented reduction. 7 profiles. Token budgets. Compact-AI output for LLMs.

Drift

vajra drift v1.json v2.json

Schema drift detection with JSD, Wasserstein distance, severity classification.

Anomalies

vajra anomalies batch.ndjson

MAD-based outliers, rarity scoring, type instability. Deterministic. Explainable.

Query

vajra query data.json 'entropy($.status) > 0.5'

Path expressions with analysis functions. Entropy, rarity, null rate, instability.

Cluster

vajra cluster batch/*.json

MinHash + LSH similarity clustering. Finds payload families in seconds.


Forged for the Agent Gods

Vajra was not designed for casual use. It was forged as a weapon — an instrument of precision for AI systems that need to understand structured data at scale.

The compact-ai output compresses a 1000-node JSON document into a token-efficient essence that preserves every anomaly, every structural motif, every statistical signal — in a format an LLM can parse in a single pass.

The chain-ready drill section tells the downstream model exactly which paths have deeper analysis available, enabling multi-turn investigation without re-processing.

The determinism guarantee means the same input always produces the same output. No drift. No randomness. No surprises. An AI pipeline that depends on Vajra can depend on Vajra.

vajra essence massive.json --profile ai --format compact-ai --budget 500
{
  "v": "vajra/1",
  "doc": {"nodes": 847, "paths": 23, "depth": 6},
  "anomalies": [
    {"p": "$.claims[*].allowed", "t": "type_instability", "s": 0.4},
    {"p": "$.claims[*].charge", "t": "numeric_outlier", "v": 350, "z": 4.2}
  ],
  "drill": [
    {"path": "$.claims[*].service_lines", "available": ["stats", "anomalies", "motifs"]}
  ],
  "meta": {"profile": "ai", "truncated": false}
}

The Engine

BLAKE3 Fingerprinting

Merkle subtree hashing. Path set signatures. Motif detection falls out for free. O(n).

Shannon Entropy

Distinguishes boilerplate from signal without domain knowledge. The strongest universal primitive.

MAD Outliers

50% breakdown point. Half the data can be corrupted before MAD gives a misleading result.

Jensen-Shannon Divergence

Symmetric. Bounded. A proper metric via sqrt. The right way to measure distribution drift.

DDSketch

Relative-error quantile estimation. Mergeable. O(1) per insert. Streams terabytes in megabytes of RAM.

MinHash + LSH

Sublinear similarity search. Cluster 10K documents in seconds. No O(n^2) anywhere.


Install

cargo install vajra-cli

Or from source:

git clone https://github.com/copyleftdev/vajra
cd vajra
cargo build --release

First useful output in under 30 seconds:

echo '{"hello": "world"}' | vajra inspect -