essence
essence is the command Vajra was built for. It takes a JSON document, runs the full analysis pipeline, scores every observation against a concern profile’s weight vector, and renders a compressed, ranked, faithful representation — shaped for whoever is reading it.
An essence is not a summary. A summary loses information probabilistically. An essence compresses information deterministically, preserving everything above a configurable importance threshold while collapsing structural noise.
Usage
vajra essence <input> [flags]
Arguments:
| Argument | Description |
|---|---|
<input> | Path to a JSON file, - for stdin, directory, or HTTP URL |
Flags:
| Flag | Description | Default |
|---|---|---|
--format <fmt> | Output format: text, json, markdown, compact-ai | text |
--profile <name> | Concern profile: staff, engineer, auditor, ai, fraud, or custom | engineer |
--budget <N> | Approximate token budget for output | unlimited |
--config <path> | Path to TOML file with custom profile definitions | none |
--input-format <fmt> | Override auto-detected input format | auto |
--streaming | Force streaming mode | off |
--redact | Apply built-in redaction before rendering | off |
--explain | Include score decomposition for each observation | off |
--quiet | Suppress progress output | off |
How Essence Construction Works
-
Collect candidates. All observations from the analysis pipeline — notable fields, motifs, anomalies, relationship discoveries — become candidates.
-
Score each candidate using the active profile’s six-dimensional weight vector:
rarity— self-information of the observationinstability— type instability at the pathentropy_signal— distance from 0.5 normalized entropy (both constants and noise score high)structural_coverage— fraction of total nodes under this pathanomaly_strength— maximum anomaly score across dimensionsconcern_relevance— profile-specific boost for this path or observation type
-
Collapse motifs. Repeated structural patterns are represented once with a count and specific variations noted.
-
Rank by composite score with deterministic tie-breaking (shallower paths first, then lexicographic).
-
Apply token budget (if
--budgetis set). Greedy selection by score-per-token — the fractional knapsack approximation. -
Render using the profile’s vocabulary and rendering style.
Profiles at a Glance
| Profile | Vocabulary | Rendering | Emphasizes |
|---|---|---|---|
staff | Plain language | Narrative sections | Anomalies, structural coverage |
engineer | Technical, JSONPath | Tabular, list-based | Type instability, all dimensions balanced |
auditor | Formal | Completeness-focused | Instability, concern relevance, missingness |
ai | Compact, terse | Machine-readable | Entropy signal, structural coverage, anomalies |
fraud | Investigative | Outlier-focused | Rarity, anomaly strength |
See Profiles for full weight vectors and customization.
Example: Staff Profile
vajra essence claim.json --profile staff
=== Essence (staff profile) ===
Document Summary:
1 claim with 14 service lines, 1 patient, 2 diagnosis codes.
Primary status: partially adjudicated.
What Stands Out:
- 3 service lines are missing allowed amounts (lines 2, 7, 11).
This field is present in 79% of service lines — its absence is notable.
- Adjustment reason code "CO-45" repeats across 8 of 14 lines.
Repetition at this frequency suggests a systematic pattern, not random variation.
- 1 diagnosis structure differs from the other.
The second diagnosis carries an extra "qualifier" field.
- Provider taxonomy code is absent.
This field is expected in 94% of claims in typical batches.
What This Likely Means:
- Most of the claim is consistent and well-formed.
- A subset of service lines appears incomplete or differently processed.
- The repeated adjustment code points to a systematic issue.
No JSONPath. No z-scores. No jargon. The staff member gets what they need to act.
Example: Engineer Profile
vajra essence claim.json --profile engineer
=== Essence (engineer profile) ===
Structure: 847 nodes, 23 distinct paths, max depth 6
Fingerprint (path set): a1b2c3d4...
Dominant motif: $.claims[*].service_lines[*] (14 instances, 8 fields each)
Notable paths:
$.claims[*].service_lines[*].allowed_amount
null_rate: 0.214, entropy: 3.12, type: number (100%)
absent in 3 of 14 service lines (indices 2, 7, 11)
$.claims[*].service_lines[*].adjustment.reason
entropy: 1.56, cardinality: 4
dominant value: "CO-45" (57.1%, 8 of 14)
$.claims[*].diagnosis[1]
structural deviation: extra field "qualifier" (not in diagnosis[0])
Type stability: 100% across all paths
Array homogeneity: service_lines 100% (1 shape hash), diagnosis 50% (2 shape hashes)
Example: AI Profile with Token Budget
vajra essence claim.json --profile ai --format json --budget 500
{
"vajra_essence": {
"version": "0.1.0",
"profile": "ai",
"input_hash": "b3a7f2c1d4e5...",
"structure": {
"root_type": "object",
"total_nodes": 847,
"distinct_paths": 23,
"max_depth": 6
},
"dominant_motif": {
"path": "$.claims[0].service_lines[*]",
"count": 14,
"shape_hash": "f2c1d4e5...",
"fields": ["procedure_code", "service_date", "charge_amount", "allowed_amount", "status", "adjustment"]
},
"anomalies": [
{
"path": "$.claims[0].service_lines[2,7,11].allowed_amount",
"type": "missing",
"severity": 4.2
},
{
"path": "$.claims[0].diagnosis[1]",
"type": "structural_deviation",
"severity": 3.1
}
],
"notable": [
{
"path": "$.claims[0].service_lines[*].adjustment.reason_code",
"observation": "value 'CO-45' in 8/14 instances (57%)"
}
],
"meta": {
"budget_tokens": 500,
"truncated": false,
"observations_included": 4,
"observations_total": 7
}
}
}
The AI profile collapses aggressively. Motifs are represented once with counts. Observations are sorted by score-per-token. The meta.truncated field tells the downstream model whether anything was cut.
Example: Compact-AI Format
vajra essence claim.json --profile ai --format compact-ai --budget 300
{"v":"vajra/1","n":847,"p":23,"d":6,"motif":{"p":"$.claims[0].service_lines[*]","c":14},"a":[{"p":"$.claims[0].service_lines[2,7,11].allowed_amount","t":"miss","s":4.2},{"p":"$.claims[0].diagnosis[1]","t":"struct","s":3.1}],"drill":[{"p":"$.claims[*].service_lines","avail":["stats","anomalies","motifs"]}]}
Maximum compression. Every key shortened. The drill section tells the LLM which paths have deeper analysis available for follow-up queries.
Example: With –explain
vajra essence claim.json --profile engineer --explain
Notable paths:
$.claims[*].service_lines[*].allowed_amount
null_rate: 0.214, entropy: 3.12
[score: 0.68]
rarity: 0.42 x weight 0.15 = 0.063
instability: 0.00 x weight 0.25 = 0.000
entropy_signal: 0.24 x weight 0.15 = 0.036
structural_coverage: 0.18 x weight 0.15 = 0.027
anomaly_strength: 0.89 x weight 0.15 = 0.134
concern_relevance: 0.75 x weight 0.15 = 0.113
Every score decomposed into its six dimensions. Nothing hidden. Nothing magic.
The Token Budget
When --budget N is specified, Vajra estimates the token cost of each observation (word count x 1.3) and selects greedily by score-per-token until the budget is exhausted. This is the fractional knapsack approximation — optimal for the greedy case.
The budget is approximate, not exact. It prevents bloated output without requiring precise token counting.
When to Use It
- Non-technical stakeholders.
--profile stafftranslates the data into plain language. - AI pipelines.
--profile ai --format compact-ai --budget 500compresses a 1000-node document into a token-efficient context. - Audits.
--profile auditoremphasizes completeness, missingness, and traceability. - Fraud screening.
--profile fraudamplifies anomalies and rare patterns. - Documentation.
--format markdownrenders the essence as publishable documentation.