Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Domain Plugins

Core Vajra is domain-agnostic. It analyzes structure, statistics, and deviation from norms — without knowing what the data represents. Domain intelligence enters through plugins that extend the engine without contaminating it.

A plugin does not change what Vajra computes. It enriches what Vajra knows.


The Plugin Architecture

Plugins contribute four kinds of extensions:

  1. Type recognizers — pattern matchers that identify domain-specific value types (ICD-10 codes, NPIs, SWIFT codes)
  2. Concern profiles — custom scoring weight vectors and rendering templates
  3. Relationship hints — domain knowledge about which fields form logical groups
  4. Custom renderers — domain-specific essence rendering templates

Plugins cannot modify the core analysis pipeline, access the filesystem beyond their own configuration, make network calls, or mutate the input document. They are additive. They are isolated.


The VajraPlugin Trait

#![allow(unused)]
fn main() {
pub trait VajraPlugin: Send + Sync {
    /// Plugin identifier.
    fn name(&self) -> &str;

    /// Plugin version string.
    fn version(&self) -> &str;

    /// Additional type recognizers beyond the core DFA bank.
    /// These run alongside the core recognizers during semantic lifting.
    fn type_recognizers(&self) -> Vec<Box<dyn TypeRecognizer>> {
        vec![]
    }

    /// Additional concern profile definitions.
    /// These appear alongside built-in profiles in `vajra profiles`.
    fn concern_profiles(&self) -> Vec<Box<dyn ConcernProfile>> {
        vec![]
    }

    /// Field relationship heuristics.
    /// Example: "code + description + system = coded concept"
    fn relationship_hints(&self) -> Vec<RelationshipHint> {
        vec![]
    }

    /// Custom rendering templates for essence output.
    fn renderers(&self) -> Vec<Box<dyn EssenceRenderer>> {
        vec![]
    }
}
}

Every method has a default implementation that returns empty. A plugin can implement only the capabilities it needs.


TypeRecognizer

Type recognizers extend Vajra’s semantic lifting layer. They match raw string values against domain-specific patterns.

#![allow(unused)]
fn main() {
pub trait TypeRecognizer: Send + Sync {
    /// The name of the recognized type (e.g., "ICD-10-CM", "CPT", "NPI").
    fn type_name(&self) -> &str;

    /// Returns true if the value matches this type's pattern.
    fn matches(&self, value: &str) -> bool;

    /// Optional confidence level for the match.
    fn confidence(&self, value: &str) -> f64 {
        if self.matches(value) { 1.0 } else { 0.0 }
    }
}
}

Type recognizers run during Layer 4 (Semantic Lifting) of the engine pipeline. They are evaluated after the core DFA bank, allowing domain-specific patterns to augment — not override — the core type inference.


RelationshipHint

Relationship hints tell Vajra that certain field combinations form logical groups:

#![allow(unused)]
fn main() {
pub struct RelationshipHint {
    /// Fields that form a logical group when co-located.
    pub field_patterns: Vec<String>,

    /// Name for this relationship.
    pub name: String,

    /// Description of what the group represents.
    pub description: String,
}
}

Example from the medical plugin:

#![allow(unused)]
fn main() {
RelationshipHint {
    field_patterns: vec![
        "code".to_string(),
        "system".to_string(),
        "display".to_string(),
    ],
    name: "coded-concept".to_string(),
    description: "A coded value with its coding system and human-readable display".to_string(),
}
}

When Vajra finds code, system, and display as sibling keys in an object, the medical plugin’s relationship hint identifies this as a coded concept — not three independent strings.


The Medical Plugin: vajra-domain-med

The medical plugin is the reference implementation. It demonstrates every plugin capability.

Type Recognizers

Recognized TypePatternExample Values
ICD-10-CM[A-Z][0-9]{2}(\.[0-9A-Z]{1,4})?E11.9, J44.1, M54.5
ICD-10-PCS[0-9A-HJ-NP-Z]{7}0SG00ZJ
CPT[0-9]{5} (with known range validation)99213, 99214, 27447
HCPCS[A-V][0-9]{4}J0129, G0438
NDC[0-9]{4,5}-[0-9]{3,4}-[0-9]{1,2}0069-0770-01
NPI[0-9]{10} (with Luhn check)1234567893
Denial Reason(CO|PR|OA|PI|CR)-[0-9]{1,3}CO-45, PR-1, OA-23

Relationship Hints

HintFieldsMeaning
Coded Conceptcode, system, displayA value from a terminology system
Service Lineprocedure_code, charge_amount, service_date, statusA line item on a claim
Patient Identitypatient.id, patient.name, patient.dobPatient demographic group
Provider Identityprovider.npi, provider.name, provider.taxonomyProvider identification group
Adjudicationallowed_amount, paid_amount, status, adjustmentPayment determination group

What It Enables

With the medical plugin loaded, vajra inspect on a medical claim produces:

=== Domain Type Recognition ===
  $.claims[*].diagnosis[*].code           E11.9      ICD-10-CM
  $.claims[*].diagnosis[*].code           J44.1      ICD-10-CM
  $.claims[*].service_lines[*].procedure_code  99213  CPT
  $.claims[*].provider.npi                1234567890 NPI
  $.claims[*].service_lines[*].adjustment.reason  CO-45  Denial Reason

Without the plugin, those values are just strings. With it, they are clinically meaningful codes.


Building Your Own Plugin

Step 1: Create a Crate

cargo new vajra-domain-finance --lib

Step 2: Depend on vajra-types

# Cargo.toml
[dependencies]
vajra-types = { version = "0.1", path = "../vajra-types" }

Step 3: Implement the Trait

#![allow(unused)]
fn main() {
use vajra_types::traits::{VajraPlugin, TypeRecognizer, RelationshipHint};

pub struct FinancePlugin;

impl VajraPlugin for FinancePlugin {
    fn name(&self) -> &str { "finance" }
    fn version(&self) -> &str { "0.1.0" }

    fn type_recognizers(&self) -> Vec<Box<dyn TypeRecognizer>> {
        vec![
            Box::new(SwiftCodeRecognizer),
            Box::new(IbanRecognizer),
            Box::new(CurrencyCodeRecognizer),
        ]
    }

    fn relationship_hints(&self) -> Vec<RelationshipHint> {
        vec![
            RelationshipHint {
                field_patterns: vec![
                    "amount".to_string(),
                    "currency".to_string(),
                ],
                name: "monetary-value".to_string(),
                description: "Amount with its currency denomination".to_string(),
            },
        ]
    }
}

struct SwiftCodeRecognizer;

impl TypeRecognizer for SwiftCodeRecognizer {
    fn type_name(&self) -> &str { "SWIFT/BIC" }

    fn matches(&self, value: &str) -> bool {
        let len = value.len();
        (len == 8 || len == 11)
            && value[..4].chars().all(|c| c.is_ascii_uppercase())
            && value[4..6].chars().all(|c| c.is_ascii_uppercase())
            && value[6..8].chars().all(|c| c.is_ascii_alphanumeric())
    }
}
}

Step 4: Register the Plugin

Static plugins are compiled into the binary at build time by adding the crate to vajra-cli’s dependencies.

Dynamic plugins are loaded at runtime via libloading from the plugin directory (default: ~/.vajra/plugins/).


Error Isolation

Plugins run in an isolation boundary. If a plugin panics or returns an error:

  1. The panic is caught at the plugin boundary (via std::panic::catch_unwind).
  2. Core analysis continues without the plugin’s contributions.
  3. The plugin failure is recorded in the output’s provenance metadata.
  4. A diagnostic message is emitted to stderr.
vajra: plugin "finance" failed during type recognition: index out of bounds
vajra: continuing analysis without finance plugin contributions

No plugin failure can crash Vajra. No plugin can corrupt the core analysis. The isolation is structural, not aspirational.


Plugin Constraints

A plugin may:

  • Register type recognizers, profiles, relationship hints, and renderers
  • Read its own configuration files
  • Use any safe Rust code internally

A plugin may not:

  • Modify the core analysis pipeline
  • Access the filesystem beyond its own config directory
  • Make network calls
  • Mutate the input document
  • Introduce nondeterminism (all plugin methods must be deterministic)

Shipped Plugins

Six domain plugins ship with Vajra, all enabled by default via feature flags:

DomainPluginType RecognizersHints
Medical / EDIvajra-domain-medICD-10, CPT, HCPCS, NDC, NPI, Diagnosis Code6 (claim service line, diagnosis, patient, provider, adjudication, denial)
Securityvajra-domain-secCVE, IPv4, IPv6, CIDR, MAC, SHA-256, SHA-1, MD5, JWT, MITRE ATT&CK Technique, MITRE Tactic, CVSS6 (network flow, alert classification, vulnerability, auth, process execution, DNS)
DevOpsvajra-domain-devopsContainer ID, Semver, Git SHA, Docker Image, AWS ARN, GCP Resource, CIDR, Cron, K8s Namespace, Terraform Resource6 (K8s pod spec, deployment metadata, service endpoint, Terraform, CI pipeline, container spec)
Source Codevajra-domain-sourcesnake_case, camelCase, PascalCase, SCREAMING_SNAKE, import paths, source file paths6 (function definition, class definition, import statement, parameter list, conditional, loop)
Encodingvajra-domain-encodingBase64, Base64URL, hex, URL-encoded, HTML entities, Unicode escapes, PEM, data URI, quoted-printable, MIME encoded word, Punycode, double-encoded, mixed-encoding3 (content+encoding, transfer encoding, encoded/decoded pairs)
GitHubvajra-domain-githubPR number, issue number, GitHub username, repo slug, commit SHA, branch name, label, milestone, review state, merge method7 (pull request, issue, review, commit, release, workflow run, discussion)

Feature Flags

# vajra-cli/Cargo.toml
[features]
default = ["medical", "security", "devops", "source", "encoding", "github"]
medical = ["vajra-domain-med"]
security = ["vajra-domain-sec"]
devops = ["vajra-domain-devops"]
source = ["vajra-source", "vajra-domain-source"]
encoding = ["vajra-domain-encoding"]
github = ["vajra-domain-github"]
all-plugins = ["medical", "security", "devops", "source", "encoding", "github"]

Build without a plugin: cargo build --no-default-features --features security,devops


The Security Plugin: vajra-domain-sec

The security plugin recognizes types commonly found in SIEM events, vulnerability scans, threat intelligence feeds, and network flow data.

Type Recognizers

Recognized TypePatternExample Values
CVE IDCVE-YYYY-NNNNNCVE-2024-3400, CVE-2023-44487
IPv4Dotted-quad, each octet 0-255192.168.1.1, 10.0.0.1
IPv6Full, compressed, mixed notation2001:db8::1, ::1
CIDRIPv4/prefix (0-32)10.0.0.0/8, 192.168.1.0/24
MAC AddressColon or hyphen separatedaa:bb:cc:dd:ee:ff
SHA-25664 lowercase hex charse3b0c44298fc1c14...
SHA-140 lowercase hex charsda39a3ee5e6b4b0d...
MD532 lowercase hex charsd41d8cd98f00b204...
JWTeyJ...\.eyJ...\.sigJSON Web Tokens
MITRE ATT&CK TechniqueT\d{4}(.\d{3})?T1059, T1059.001
MITRE ATT&CK TacticTA\d{4}TA0001, TA0040
CVSS VectorCVSS:3.x/AV:.../...Full CVSS v3 vector strings

The DevOps Plugin: vajra-domain-devops

The DevOps plugin recognizes types in Kubernetes manifests, Terraform state, CI/CD pipeline output, Docker configurations, and cloud infrastructure JSON.

Type Recognizers

Recognized TypePatternExample Values
Container ID12 or 64 lowercase hex charsa1b2c3d4e5f6
Semverv?MAJOR.MINOR.PATCH(-pre)?(+build)?v1.2.3, 1.0.0-beta.1
Git SHA7-12 or 40 lowercase hex charsa1b2c3d, full 40-char SHA
Docker Image[registry/]repo:tag or repo@sha256:digestnginx:latest, gcr.io/proj/img:v1
AWS ARNarn:aws:service:region:account:resourcearn:aws:s3:::my-bucket
GCP Resourceprojects/*/... or organizations/*/...projects/my-proj/topics/t
CIDR BlockIPv4/prefix (0-32)10.0.0.0/16
Cron Expression5-field cron pattern0 */6 * * *
K8s NamespaceDNS-1123 labels, known system namespaceskube-system, my-app-staging
Terraform Resourceprovider_type.nameaws_instance.web

The Source Code Plugin: vajra-domain-source

The source code plugin recognizes patterns in the JSON trees produced by vajra-source (tree-sitter CST-to-JSON output). It works alongside vajra-source, which handles the parsing.

Type Recognizers

Recognized TypePatternExample Values
snake_case identifier[a-z][a-z0-9]*(_[a-z0-9]+)+my_function, get_value
camelCase identifier[a-z]...[A-Z]...myFunction, getValue
PascalCase identifier[A-Z][a-zA-Z0-9]+MyClass, HttpClient
SCREAMING_SNAKE_CASE[A-Z][A-Z0-9]*(_[A-Z0-9]+)+MAX_SIZE, HTTP_STATUS
Import pathmod::path or pkg.Class or @scope/pkgstd::collections::HashMap
Source file pathPath ending in .rs, .py, .go, etc.src/main.rs, lib/utils.py

Relationship Hints

HintPatternMeaning
Function definitionname + parameters + bodyA function or method
Class definitionname + body + inheritanceA class or struct
Import statementpath + optional aliasA use/import declaration
Parameter listtype + name pairsFunction parameters
Conditional blockcondition + consequence + alternativeAn if/else construct
Loop blockcondition/iterator + bodyA for/while loop

The Encoding Plugin: vajra-domain-encoding

The encoding plugin detects data encodings embedded in JSON string values. It identifies Base64, hex, URL encoding, HTML entities, PEM certificates, and more — including adversarial patterns like double encoding and mixed encoding used for evasion.

Type Recognizers (3 Tiers)

Tier 1 — Definite confidence (structural markers, near-zero false positives):

Recognized TypePatternExample Values
PEM block-----BEGIN ...----- prefix/suffixCertificates, private keys
Data URIdata:mime;base64,...Embedded images, payloads
MIME encoded word=?charset?B/Q?...?=Email header encoding
Punycodexn-- prefixInternationalized domain names

Tier 2 — Dominant confidence (strong patterns, low false positives):

Recognized TypePatternExample Values
URL encoded2+ %XX sequences + trial decodehello%20world%21
Quoted-printable3+ =XX sequencesMIME email encoding
HTML entity2+ &...; entities&lt;script&gt;
Unicode escape2+ \uXXXX or \xNN\u0048\u0065
Base64URL16+ chars, URL-safe alphabetAPI tokens, URL-safe data

Tier 3 — Heuristic (aggressive false positive gating):

Recognized TypeDetectionSecurity Signal
Base6424+ chars, div-by-4, trial decode, entropy gateObfuscated payloads, exfiltration
Hex encoded32+ chars, excludes known hash lengthsShellcode, binary blobs
Double encodedDecode reveals another encodingEvasion technique (%253C%3C<)
Mixed encoding2+ encoding types in one valueObfuscation, WAF bypass

Layer Peeling API

Beyond type recognition, the plugin provides detect_encoding_layers() for recursive analysis:

#![allow(unused)]
fn main() {
use vajra_domain_encoding::detect_encoding_layers;

let layers = detect_encoding_layers("%2548ello%2520world", 5);
// Returns: [url_encoded(depth=0), url_encoded(depth=1)]
}

Bounded at depth 5, decode capped at 4KB per layer. Catches base64(url(hex(payload))).


The GitHub Plugin: vajra-domain-github

The GitHub plugin recognizes types commonly found in GitHub API responses, webhook payloads, and exported repository data (PRs, issues, commits, reviews, releases, workflow runs).

Type Recognizers

Recognized TypePatternPriorityConfidenceExample Values
PR Number#\d+ or bare integer in PR context100.90#142, 1587
Issue Number#\d+ or bare integer in issue context100.90#23, 456
GitHub Username[a-zA-Z0-9](-?[a-zA-Z0-9]){0,38}200.75copyleftdev, octocat
Repo Slugowner/repo pattern150.85copyleftdev/vajra, rust-lang/rust
Commit SHA7-40 hex chars in commit context100.95a1b2c3d, full 40-char SHA
Branch NameRef-like strings with / separators250.70main, feature/cascade-cmd
LabelKnown label patterns (bug, enhancement, etc.)300.65bug, good first issue
MilestoneVersion-like or sprint-like strings300.60v1.0, Sprint 12
Review StateOne of: approved, changes_requested, commented, dismissed51.00approved, changes_requested
Merge MethodOne of: merge, squash, rebase51.00squash, rebase

Relationship Hints

HintField PatternsMeaning
Pull Requestnumber, title, state, author, base, headA pull request record
Issuenumber, title, state, labels, assigneesAn issue record
Reviewauthor, state, body, submitted_atA PR review
Commitsha, message, author, dateA commit record
Releasetag_name, name, published_at, assetsA release record
Workflow Runname, status, conclusion, run_numberA CI workflow run
Discussiontitle, author, category, answerA GitHub discussion

Future Plugin Domains

The architecture supports any domain:

DomainPluginType Recognizers
Financialvajra-domain-financeSWIFT, IBAN, CUSIP, currency codes
Telecomvajra-domain-telecomE.164 numbers, IMSI, CDR fields
IoT / Sensorvajra-domain-iotSensor types, unit patterns, device IDs