drift
drift detects and quantifies structural, type, and distributional changes between two JSON documents. It answers the question every engineer asks when something breaks: what changed?
Not what changed in the values — what changed in the shape, types, and statistical behavior of the data.
Usage
vajra drift <baseline> <candidate> [flags]
Arguments:
| Argument | Description |
|---|---|
<baseline> | The reference document (the “before”) |
<candidate> | The comparison document (the “after”) |
Flags:
| Flag | Description | Default |
|---|---|---|
--format <fmt> | Output format: text, json, markdown, compact-ai | text |
--profile <name> | Concern profile for severity weighting | engineer |
--input-format <fmt> | Override auto-detected input format | auto |
--redact | Apply built-in redaction before output | off |
--quiet | Suppress progress output | off |
--group-by <path> | JSONPath for population-level comparison (e.g., '$.author_type') | off |
Population-Level Comparison
When --group-by is specified, drift partitions records by the field value and computes pairwise drift between all groups. Instead of comparing two documents, you compare two (or more) subpopulations within the same dataset.
vajra drift prs.ndjson --group-by '$.author_type'
Drift Report (grouped by $.author_type)
Groups: bot (412 records), human (835 records)
Pairwise drift: bot vs human
Structural similarity: 0.91 (Jaccard)
Distribution shifts:
$.files_changed JSD: 0.42 (high)
bot: median 1.0, p95 3.0
human: median 4.0, p95 18.0
$.review_comments JSD: 0.38 (moderate)
bot: median 0.0, p95 1.0
human: median 2.0, p95 8.0
Overall severity: HIGH (significant distributional divergence)
This is useful for comparing behavioral subgroups — bot vs. human PRs, different teams, production vs. staging, before vs. after a policy change — without needing separate files.
Drift Dimensions
Structural Drift
Path set symmetric difference:
added_paths = paths(candidate) \ paths(baseline)
removed_paths = paths(baseline) \ paths(candidate)
New fields appearing. Old fields disappearing. The most visible form of schema evolution.
Type Drift
For each path present in both documents, the dominant type is compared. Any path where the type changed (e.g., string to number, array to object) is flagged.
Distributional Drift
Jensen-Shannon Divergence (JSD) measures how much value distributions shifted between baseline and candidate:
JSD(P || Q) = 0.5 * KL(P || M) + 0.5 * KL(Q || M)
where M = 0.5 * (P + Q).
JSD is symmetric, always finite, bounded to [0, 1], and its square root is a proper metric. This means drift magnitudes can be meaningfully compared and accumulated across paths.
For numeric paths, Vajra also computes the 1D Wasserstein distance (earth mover’s distance), which captures how far values moved, not just that they moved.
Drift Classification
Each drifted path receives a classification:
| Class | Meaning |
|---|---|
additive | New path appeared in candidate |
subtractive | Path present in baseline, absent in candidate |
type-mutative | Dominant type changed |
distributional | Value distribution shifted (JSD > threshold) |
cardinality-shift | Array lengths changed significantly |
null-rate-shift | Null/missing ratio changed significantly |
Severity Scoring
The overall drift severity is a weighted sum of drift dimensions, tuned by the active profile:
- Auditor profiles weight subtractive drift highest (missing data is critical for compliance)
- Engineer profiles weight type-mutative drift highest (breaking changes)
- Fraud profiles weight distributional drift highest (behavioral shifts)
Example: Text Output
vajra drift yesterday.json today.json
Drift Report: yesterday.json -> today.json
Structural similarity: 0.94 (Jaccard)
Added paths (2):
$.response.metadata.processing_flags [array of strings]
$.response.metadata.api_version [string]
Removed paths (0): none
Type changes (1):
$.response.items[*].quantity string -> number (clean type migration)
Distribution shifts (1):
$.response.items[*].status JSD: 0.34 (moderate)
before: {"active": 0.82, "pending": 0.15, "error": 0.03}
after: {"active": 0.61, "pending": 0.12, "error": 0.27}
note: "error" rate increased 9x
Null rate changes (0): none
Overall severity: MEDIUM (structural additions + significant distribution shift)
Example: JSON Output
vajra drift yesterday.json today.json --format json
{
"baseline": "yesterday.json",
"candidate": "today.json",
"jaccard_similarity": 0.94,
"overall_severity": "medium",
"added_paths": [
{
"path": "$.response.metadata.processing_flags",
"type": "array"
},
{
"path": "$.response.metadata.api_version",
"type": "string"
}
],
"removed_paths": [],
"type_changes": [
{
"path": "$.response.items[*].quantity",
"baseline_type": "string",
"candidate_type": "number",
"jsd": 0.0
}
],
"distribution_shifts": [
{
"path": "$.response.items[*].status",
"jsd": 0.34,
"baseline_distribution": {
"active": 0.82,
"pending": 0.15,
"error": 0.03
},
"candidate_distribution": {
"active": 0.61,
"pending": 0.12,
"error": 0.27
}
}
],
"null_rate_changes": []
}
Example: Medical Claim Drift
vajra drift baseline_claim.json updated_claim.json --profile auditor
Drift Report: baseline_claim.json -> updated_claim.json
Structural similarity: 0.87 (Jaccard)
Added paths (3):
$.claims[*].service_lines[*].modifier_codes [array of strings]
$.claims[*].rendering_provider [object]
$.claims[*].rendering_provider.npi [string]
Removed paths (1):
$.claims[*].provider.taxonomy [string]
** SUBTRACTIVE: field present in baseline, absent in candidate **
Type changes (0): none
Distribution shifts (2):
$.claims[*].service_lines[*].status JSD: 0.22
before: {"adjudicated": 0.85, "pending": 0.15}
after: {"adjudicated": 0.64, "pending": 0.21, "denied": 0.15}
note: new value "denied" appeared
$.claims[*].service_lines[*].charge_amount Wasserstein: 125.40
before: median 285.00, p95 890.00
after: median 410.00, p95 1350.00
note: charges shifted upward
Overall severity: HIGH (subtractive drift in auditor profile)
The auditor profile flags the removed taxonomy path as high severity because subtractive drift — data that was present and is now absent — is the most dangerous form of schema evolution for compliance.
When to Use It
- API version migration. Compare the response shape before and after a deploy.
- Vendor data monitoring. Compare this week’s feed to last week’s. Detect undocumented schema changes before they break your pipeline.
- Regulatory compliance. Prove that the data structure has not drifted outside acceptable bounds.
- CI integration. Gate deploys on drift severity. If drift exceeds a threshold, fail the build and require review.
Pairs Well With
fingerprint— quick structural same-or-different check before detailed drift analysisinspect— understand each document’s structure before comparinganomalies— drift detects changes between versions; anomalies detect deviations within a versionessence— drift observations feed into essence generation when a baseline is provided