VAJRA
Deterministic Semantic Reduction Engine
What Vajra Does
Feed it any structured data. Get back shape, signal, anomalies, and truth.
Vajra analyzes JSON, YAML, CSV, NDJSON, Markdown, and PDF. It extracts structural fingerprints, computes entropy and statistical profiles, detects anomalies and schema drift, discovers cross-field relationships, and renders deterministic essences tuned for humans, auditors, or AI pipelines.
Inspect
vajra inspect claim.json
Full structural analysis — paths, types, fingerprints, domain recognition.
Essence
vajra essence data.json --profile staff
Concern-oriented reduction. 7 profiles. Token budgets. Compact-AI output for LLMs.
Drift
vajra drift v1.json v2.json
Schema drift detection with JSD, Wasserstein distance, severity classification.
Anomalies
vajra anomalies batch.ndjson
MAD-based outliers, rarity scoring, type instability. Deterministic. Explainable.
Query
vajra query data.json 'entropy($.status) > 0.5'
Path expressions with analysis functions. Entropy, rarity, null rate, instability.
Cluster
vajra cluster batch/*.json
MinHash + LSH similarity clustering. Finds payload families in seconds.
Forged for the Agent Gods
Vajra was not designed for casual use. It was forged as a weapon — an instrument of precision for AI systems that need to understand structured data at scale.
The compact-ai output compresses a 1000-node JSON document into a token-efficient essence that preserves every anomaly, every structural motif, every statistical signal — in a format an LLM can parse in a single pass.
The chain-ready drill section tells the downstream model exactly which paths have deeper analysis available, enabling multi-turn investigation without re-processing.
The determinism guarantee means the same input always produces the same output. No drift. No randomness. No surprises. An AI pipeline that depends on Vajra can depend on Vajra.
vajra essence massive.json --profile ai --format compact-ai --budget 500
{
"v": "vajra/1",
"doc": {"nodes": 847, "paths": 23, "depth": 6},
"anomalies": [
{"p": "$.claims[*].allowed", "t": "type_instability", "s": 0.4},
{"p": "$.claims[*].charge", "t": "numeric_outlier", "v": 350, "z": 4.2}
],
"drill": [
{"path": "$.claims[*].service_lines", "available": ["stats", "anomalies", "motifs"]}
],
"meta": {"profile": "ai", "truncated": false}
}
The Engine
BLAKE3 Fingerprinting
Merkle subtree hashing. Path set signatures. Motif detection falls out for free. O(n).
Shannon Entropy
Distinguishes boilerplate from signal without domain knowledge. The strongest universal primitive.
MAD Outliers
50% breakdown point. Half the data can be corrupted before MAD gives a misleading result.
Jensen-Shannon Divergence
Symmetric. Bounded. A proper metric via sqrt. The right way to measure distribution drift.
DDSketch
Relative-error quantile estimation. Mergeable. O(1) per insert. Streams terabytes in megabytes of RAM.
MinHash + LSH
Sublinear similarity search. Cluster 10K documents in seconds. No O(n^2) anywhere.
Install
cargo install vajra-cli
Or from source:
git clone https://github.com/copyleftdev/vajra
cd vajra
cargo build --release
First useful output in under 30 seconds:
echo '{"hello": "world"}' | vajra inspect -