Introduction

Run history on any Linux machine. You'll see a flat list of commands, numbered sequentially, stripped of all context. No record of what you were trying to accomplish. No link between the grep that found the bug and the git commit that fixed it. No memory of which deployment sequence worked last Thursday when the same thing broke.

The shell history command has been essentially unchanged since 1979. Forty-five years of the same append-only text file.

TALA changes this.

TALA is an intent-native narrative execution layer. It reimagines shell history as a causality-aware, graph-structured narrative of intent. Every action is captured not as a string, but as a structured node in a directed acyclic graph — linked to what caused it, what it depended on, what it produced, and how confident the system is in the outcome.

The Design Hypothesis

Systems that model intent + causality + outcome outperform systems that model commands + sequence.

Traditional systems — history, shell logs, audit trails — exhibit:

Linear, append-only structures
No semantic interpretation
No representation of causality or intent
No ability to generalize or adapt past actions
Human-centric readability with no machine reasoning

TALA replaces this with a system where intent is a first-class primitive:

I = f(C, X, P)

Where C is the command, X is the execution context, and P is prior knowledge from historical embeddings.

What This Enables

Semantic recall — search by meaning, not regex
Adaptive replay — re-execute workflows that adapt to changed context
Pattern detection — identify recurring failure clusters automatically
Prediction — anticipate the next action from historical embeddings
Narrative extraction — pull coherent stories from thousands of interactions

Who This Is For

TALA is built for systems engineers, SREs, and anyone who operates Linux systems. If you've ever wished your shell remembered why you did something — not just what you typed — this is for you.

How This Book Is Organized

Getting Started walks you through installation, building, and launching the observatory demo.
Core Concepts explains the fundamental ideas: intent, narrative graphs, outcomes, edges, semantic recall, and adaptive replay.
Architecture covers the system design, crate layout, data flow, and storage engine.
Crate Reference provides per-crate API documentation.
Operations covers deployment, configuration, the observatory dashboard, and chaos engineering.
Performance documents benchmark targets and how to run them.
Design explains the settled architectural decisions and the reasoning behind them.
Contributing describes the development workflow and Rust conventions.

Quick Start

Get TALA running in under five minutes.

Prerequisites

Rust 1.82+ (recommended via rustup)
Docker and Docker Compose (for the observatory demo)

Clone and Build

git clone https://github.com/copyleftdev/tala.git
cd tala
cargo build --release

The workspace compiles 12 crates. The first build downloads dependencies and takes a few minutes.

Run the Test Suite

cargo test --workspace

Run Benchmarks

cargo bench

Criterion benchmarks live in crates/<name>/benches/. Results are written to target/criterion/ with HTML reports.

Launch the Observatory Demo

The fastest way to see TALA in action is the Docker Compose demo. It runs four simulated Linux operations domains — Incident Response, Continuous Deployment, Observability, and System Provisioning — each generating realistic shell commands, structuring them into intent narratives, and exporting metrics to Prometheus.

cd deploy
cp .env.example .env     # adjust settings if desired
docker compose up -d

Open your browser:

Service	URL
Observatory Dashboard	http://localhost:8080
Prometheus	http://localhost:9090
Grafana	http://localhost:3000

The observatory landing page tells the TALA story, then lets you enter a live topology view of the system. Click any node to drill into its telemetry.

Tear Down

docker compose down -v

The -v flag removes named volumes. Omit it to preserve data between restarts.

Building from Source

Toolchain

TALA pins its Rust toolchain via rust-toolchain.toml at the workspace root. Running any cargo command in the workspace will automatically install the correct version if you use rustup.

cat rust-toolchain.toml

The minimum supported Rust version is 1.82.

Workspace Build

cargo build --workspace

This builds all 12 crates in dependency order. For a release-optimized build:

cargo build --release

Building Individual Crates

Each crate can be built independently:

cargo build -p tala-core
cargo build -p tala-embed
cargo build -p tala-sim

The Simulator Binary

The tala-sim crate produces the only binary in the workspace:

cargo build --release --bin tala-sim

The binary is written to target/release/tala-sim.

Feature Flags

Optional capabilities are gated behind feature flags:

Flag	Crate	Description
`cuda`	tala-embed	CUDA GPU acceleration for batch operations
`vulkan`	tala-embed	Vulkan compute shader support
`cluster`	tala-net	Distributed clustering features
`encryption`	tala-net	TLS encryption for mesh transport

Default features cover the common single-node case. Enable extras with:

cargo build -p tala-embed --features cuda

Docker Build

The deploy/Dockerfile produces a minimal container image:

docker build -t tala-sim -f deploy/Dockerfile .

The multi-stage build compiles in a full Rust image and copies only the binary into a slim Debian base.

Verifying the Build

cargo test --workspace
cargo clippy --workspace -- -D warnings

All library code is warning-free under clippy. Tests validate serialization roundtrips, graph invariants, and storage durability.

Running the Observatory

The TALA Intent Observatory is a topology-first dashboard that visualizes the system in real time. It ships as a static HTML/CSS/JS application served by nginx, with Prometheus as the metrics backend.

Architecture

┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│ sim-incident │  │ sim-deploy  │  │ sim-observe  │  │sim-provision│
│   :9101      │  │   :9102     │  │   :9103      │  │   :9104     │
└──────┬───────┘  └──────┬──────┘  └──────┬───────┘  └──────┬──────┘
       │                 │                │                  │
       └────────┬────────┴────────┬───────┴──────────┬───────┘
                │                 │                   │
         ┌──────▼──────┐   ┌─────▼─────┐   ┌────────▼────────┐
         │ Prometheus   │   │  Grafana  │   │   Observatory   │
         │   :9090      │   │  :3000    │   │   (nginx) :8080 │
         └─────────────┘   └───────────┘   └─────────────────┘

Starting

cd deploy
cp .env.example .env
docker compose up -d

What You'll See

Landing Page

The landing page tells the TALA story — what it is, what problem it solves, and why it exists. A live topology visualization shows the four operational domains with real-time metrics flowing, even before you enter the dashboard.

Dashboard Topology

After clicking Enter Observatory, you see a full-viewport topology graph:

Four domain nodes — Incident Response, Continuous Deployment, Observability, Provisioning — each showing live intent counts, ingest rates, and patterns detected
Four capability nodes — Extract (command to intent), Remember (HNSW semantic index), Persist (WAL durable log), Connect (causal edge formation)
Central narrative layer — the TALA hub showing the total edge count of the growing narrative graph
Animated edges with flowing particles representing intent being structured in real time

Detail Panels

Click any node to open a detail drawer with deep telemetry:

Domain nodes show narrative structure (graph nodes, causal edges, connectivity), what TALA learned (patterns, clusters, replays, insights), and outcome distribution
Capability nodes show subsystem-specific metrics: pipeline waterfall, HNSW capacity gauge, WAL/hot buffer/segment stats, or graph relation breakdown
The narrative layer hub shows aggregate intelligence across all domains plus lock contention health

Chaos Mode Indicator

When the chaos engine injects faults, a floating indicator appears at the bottom of the topology showing:

Current mode: Failure Injection, Latency Storm, Retry Cascade, Mixed Chaos, or Stampede
Event rate in events/minute
Which domains are affected
Visual disruption on affected edges and nodes

Configuration

See Configuration for tuning ingest rates, chaos probability, and other parameters.

Intent as a Primitive

In traditional systems, the shell records commands. TALA records intent.

The Difference

A command is a string: kubectl rollout restart deployment/api-gateway -n production. It tells you what was typed. It doesn't tell you why. Was this a routine deployment? A rollback after a failed canary? A panic restart during an incident at 2am? The string is identical in all three cases.

An intent is a structured representation of the desired outcome behind that command:

#![allow(unused)]
fn main() {
struct Intent {
    id: IntentId,
    timestamp: u64,           // nanosecond epoch
    raw_command: String,      // the original command
    embedding: Vec<f32>,      // 384-dim semantic vector
    context_hash: u64,        // hashed execution environment
    parent_ids: Vec<IntentId>,// causal predecessors
    outcome: Option<Outcome>, // what happened
    confidence: f32,          // system's confidence in classification
}
}

The Formula

TALA models intent as a function:

I = f(C, X, P)

Symbol	Meaning
C	The command or input signal
X	Context — working directory, environment, shell, user, session
P	Prior knowledge — historical embeddings from semantically similar past actions

The same command in different contexts produces different intents. systemctl restart nginx during a deployment and during an incident are different intents with different causal chains, different expected outcomes, and different semantic neighborhoods.

The Extraction Pipeline

When a raw command enters TALA, it passes through the intent extraction pipeline:

Raw command
  → Tokenization (command, args, flags, pipes, redirects)
  → Embedding (384-dimensional L2-normalized vector)
  → Classification (Build, Deploy, Debug, Configure, Query, Navigate)
  → Context attachment (cwd, user, shell, session, env hash)
  → Intent node creation

The embedding captures meaning, not syntax. Commands that do similar things — even with completely different syntax — land near each other in the embedding space. This is what enables semantic recall: searching by what you meant, not what you typed.

Why This Matters

When intent is a first-class primitive:

The system can connect actions that are causally related, even if they happened minutes or hours apart
The system can recall past actions by meaning, not string matching
The system can replay sequences of intent, adapting commands to changed context
The system can predict what you're likely to do next based on the semantic trajectory of your session
The system can learn — detecting patterns, clustering similar workflows, surfacing insights

None of this is possible with a flat list of strings.

The Narrative Graph

The narrative graph is TALA's central data structure. It's a directed acyclic probabilistic graph where nodes are intents and edges are causal relationships between them.

Definition

G = (V, E, W)

Symbol	Meaning
V	Intent nodes
E	Directed causal edges
W	Edge weights (probability / confidence)

Why a Graph

A flat list of commands loses structure. A tree is too rigid — real workflows branch, merge, and fork. A DAG captures the actual causal topology of operational work:

An incident investigation spawns multiple diagnostic branches
A deployment pipeline has linear stages but also parallel verification steps
A provisioning sequence has dependencies that form a partial order, not a total order

The narrative graph preserves all of this.

Narratives

A narrative is a coherent subgraph — a connected sequence of intents that tells a story. Given any intent node, TALA can extract its narrative by traversing the graph forward (consequences) and backward (causes).

N ⊆ G

For example, starting from a systemctl restart nginx node:

Backward traversal reveals the chain that led to the restart: an alert fired, an engineer SSH'd in, checked logs, found an OOM event, resized memory limits
Forward traversal reveals the consequences: a health check passed, the alert resolved, a post-incident summary was generated

This entire chain is the narrative. It's queryable, replayable, and durable.

Graph Operations

Operation	Method	Description
Insert node	`insert_node(id, timestamp, confidence)`	Add an intent to the graph
Form edges	`form_edges(node, similarities, k)`	Connect to the top-K semantically similar nodes
Forward BFS	`bfs_forward(start, depth)`	Find all consequences of an intent
Backward BFS	`bfs_backward(start, depth)`	Find all causes of an intent (root-cause analysis)
Extract narrative	`extract_narrative(root, depth)`	Pull the connected subgraph as (nodes, edges)

Edge Formation

When a new intent node is created, TALA uses the HNSW semantic index to find the most similar existing intents. The top-K candidates become edge targets, weighted by cosine similarity. This means edges form based on meaning, not just temporal proximity.

The edge formation algorithm:

for each new node v:
    candidates = HNSW.search(v.embedding, k=10, ef=50)
    for (candidate, similarity) in candidates:
        if similarity > threshold:
            graph.add_edge(v, candidate, relation_type, weight=similarity)

Relation types are inferred from the semantic and temporal relationship between nodes. See Edge Relations and Causality.

Outcomes and Confidence

Every intent in TALA has an outcome — what actually happened when the action was executed.

The Outcome Triple

#![allow(unused)]
fn main() {
struct Outcome {
    status: Status,      // Success, Failure, Partial, Pending
    latency_ns: u64,     // execution time in nanoseconds
    exit_code: i32,      // process exit code
}
}

Outcomes are attached asynchronously. An intent is created when the command is issued; the outcome is recorded when execution completes. This separation is important — it allows TALA to capture the intent even if the command never finishes.

Status

Status	Meaning
`Pending`	Command issued, outcome not yet recorded
`Success`	Completed with exit code 0
`Failure`	Completed with non-zero exit code
`Partial`	Completed but with incomplete or degraded results

Confidence

Each intent carries a confidence score (f32, 0.0–1.0) representing the system's confidence in its classification and embedding. Factors that affect confidence:

Command complexity — simple commands like ls have high confidence; ambiguous multi-pipe chains may have lower confidence
Context richness — commands with clear context (known cwd, established session) score higher
Embedding quality — commands with close semantic neighbors in the index score higher than novel, unseen patterns

Confidence propagates through the graph. An edge from a high-confidence node carries more weight than one from a low-confidence node.

Why Outcomes Matter

Outcomes close the loop between intent and result. Without them, TALA would capture what you tried to do but not what happened. With outcomes:

Pattern detection can identify commands that frequently fail in specific contexts
Adaptive replay can skip steps that are known to succeed in the current state
Narrative summarization can report success rates across a workflow
Root-cause analysis can trace backward from a failure to the chain of intents that led to it

Edge Relations and Causality

Edges in the narrative graph are typed and weighted. Each edge represents a specific kind of relationship between two intents.

Relation Types

#![allow(unused)]
fn main() {
#[repr(u8)]
enum RelationType {
    Causal     = 0,  // A caused B
    Temporal   = 1,  // A happened before B in the same session
    Dependency = 2,  // B depends on A's output
    Retry      = 3,  // B is a retry of A after failure
    Branch     = 4,  // B is an alternative approach to the same goal as A
}
}

Causal

The strongest relation. Intent A directly caused intent B. Example: an alert fires (A), triggering an SSH session to diagnose the issue (B).

Temporal

Intent A preceded intent B within the same session. Weaker than causal — it captures sequence without asserting direct causation.

Dependency

Intent B depends on the output or side effects of intent A. Example: a build step (A) must complete before a deploy step (B) can proceed.

Retry

Intent B is a retry of intent A after A failed. TALA detects retries by identifying semantically similar commands with a preceding failure outcome.

Branch

Intent B represents an alternative approach to the same goal as intent A. Detected when two semantically similar intents follow the same predecessor but take different paths.

Edge Weights

Every edge carries a weight (f32, 0.0–1.0) representing the strength of the relationship. Weight is derived from:

Cosine similarity between the embeddings of the connected intents
Temporal proximity — closer in time means stronger weight
Confidence of both nodes

High-weight edges indicate strong, confident causal relationships. Low-weight edges indicate weaker associations — possibly temporal coincidence rather than true causation.

How Edges Form

Edge formation happens automatically during ingest. When a new intent is created:

The HNSW index finds the top-K semantically similar existing intents
Each candidate is evaluated for relation type based on temporal order, outcome status, and embedding distance
Edges are added with appropriate types and weights
The graph remains a DAG — cycle detection prevents invalid edges

This means the graph grows organically. You don't declare relationships. TALA infers them from the structure of your work.

Semantic Recall

Semantic recall is the ability to search your history by meaning rather than by string matching.

The Problem with Regex

Today, if you want to find that command you ran last week to fix the nginx issue, you do:

history | grep nginx

You get every command that contains the string "nginx" — including irrelevant ones. You don't find the systemctl restart that actually fixed the problem, because it doesn't contain the word "nginx". You don't find the journalctl command that diagnosed the root cause, because it references a different service name.

String matching finds syntax. It doesn't find meaning.

How Semantic Recall Works

TALA embeds every command into a 384-dimensional vector space. Commands with similar meaning — regardless of syntax — land near each other in this space.

A semantic query works like this:

Your search query is embedded into the same vector space
The HNSW index finds the nearest neighbors by cosine similarity
Results are ranked by semantic distance, not string overlap

query: "how did I fix the nginx issue?"
→ embedding: [0.12, -0.34, 0.56, ...]
→ HNSW search: top-10, ef=50
→ results:
    0.94  systemctl restart nginx.service
    0.91  journalctl -u nginx --since '1 hour ago'
    0.87  nginx -t
    0.85  vim /etc/nginx/sites-available/app.conf
    0.82  curl -s http://localhost/healthz

The search found the entire diagnostic and fix sequence — not because the commands contain the word "nginx", but because they're semantically related to the concept of fixing an nginx issue.

HNSW Index

TALA uses a Hierarchical Navigable Small World (HNSW) index for approximate nearest neighbor search. Key properties:

Sub-millisecond search — 139 microseconds for 10K vectors, top-10, ef=50
High recall — HNSW provides excellent recall at practical ef values
Incremental inserts — new intents are indexed immediately, no batch rebuilds
Configurable quality — the ef parameter trades search time for recall accuracy

Combining Semantic and Temporal

Semantic recall can be combined with temporal filtering. Find semantically similar commands within a time range:

query_semantic(embedding, k=10)  →  nearest by meaning
query_temporal(TimeRange)        →  all intents in a window

This lets you ask questions like "what did I do that was similar to this deploy, but only during last week's incident?"

Adaptive Replay

Adaptive replay is TALA's ability to re-execute a sequence of intents, adapted to the current context.

Beyond Scripts

A shell script is a static sequence of commands. If the environment changes — different hostnames, different paths, different versions — the script breaks. You edit it manually and try again.

TALA's replay engine works differently. It operates on intents, not commands. It understands the dependency structure, can substitute variables, and can skip steps that have already been completed.

How Replay Works

Select a narrative — choose a root intent and extract its forward narrative from the graph
Build a plan — topologically sort the intents by their dependency edges (Kahn's algorithm)
Substitute variables — apply context-specific variable replacements (${HOST}, ${VERSION}, etc.)
Filter completed — skip intents whose outcomes are already satisfied in the current state
Execute — run each step in dependency order, recording new outcomes

#![allow(unused)]
fn main() {
// Build a replay plan
let plan = build_plan(&graph, &intent_ids, &commands)?;

// Configure replay
let config = ReplayConfig {
    vars: HashMap::from([("HOST", "web-04.prod"), ("VERSION", "2.1.0")]),
    completed: HashSet::new(),
    dry_run: false,
};

// Execute
let results = engine.execute(&graph, &intent_ids, &commands)?;
}

Dry Run

Before executing, you can preview the plan:

#![allow(unused)]
fn main() {
let steps = engine.dry_run(&graph, &intent_ids, &commands)?;
for step in &steps {
    println!("{}: {} (deps: {:?})", step.intent_id, step.command, step.deps);
}
}

This shows the exact sequence that would be executed, with dependencies resolved and variables substituted, without running anything.

Idempotency

The replay engine supports idempotent re-execution. If you provide a set of already-completed intent IDs, those steps are skipped:

#![allow(unused)]
fn main() {
let remaining = filter_completed(plan, &completed_ids);
}

This makes replay safe to re-run. If a replay fails halfway through, you can re-run it and only the incomplete steps will execute.

Why This Matters

Traditional approaches to repeatable operations — scripts, runbooks, playbooks — are static artifacts that drift from reality. TALA's replay is derived from what actually worked, adapted to what's true now. The replay plan is a living thing, generated from the narrative graph, not a file someone wrote six months ago.

System Overview

TALA replaces the flat command log with a causality-aware, graph-structured narrative of intent. Where history records what was typed, TALA models why it was typed, what happened as a result, and how that result connects to everything before and after it.

The system is built from four subsystems, each responsible for a distinct phase of the intent lifecycle.

Subsystems

talad (the daemon) is the central orchestrator. It captures raw commands from shell hooks and agent APIs, routes them through the intent extraction pipeline, manages storage, and serves queries over a Unix socket. Every other subsystem communicates through talad.

weave (the replay engine) reconstructs and re-executes narratives. Given a subgraph of the intent DAG, weave schedules commands in dependency order, adapts them to the current environment, detects idempotent operations, and handles failures with configurable recovery strategies (retry, skip, abort).

kai (the insight engine) operates over the accumulated narrative graph to detect patterns: recurring failure clusters, repeated command motifs, and predictive next-intent suggestions. It groups related intents by embedding-space proximity and surfaces structured reports.

tala-cli (the user interface) provides the command-line surface. Commands like tala find, tala replay, tala diff, tala why, and tala stitch translate user queries into requests against talad and format the results for human or machine consumption.

Architecture

                         ┌─────────────────────┐
                         │  User / Agent Input  │
                         │  (shell hook, API)   │
                         └──────────┬───────────┘
                                    │
                                    ▼
                         ┌─────────────────────┐
                         │       talad          │
                         │  (tala-daemon)       │
                         │                      │
                         │  ┌───────────────┐   │
                         │  │    Intent      │   │
                         │  │ Normalization  │   │
                         │  │ (tala-intent)  │   │
                         │  └───────┬───────┘   │
                         │          │           │
                         │          ▼           │
                         │  ┌───────────────┐   │
                         │  │    Graph       │   │
                         │  │ Construction   │   │
                         │  │ (tala-graph)   │   │
                         │  └───────┬───────┘   │
                         │          │           │
                         │          ▼           │
                         │  ┌───────────────┐   │
                         │  │  Persistent    │   │
                         │  │    Store       │   │
                         │  │ (tala-store)   │   │
                         │  └───────────────┘   │
                         └───┬──────┬──────┬────┘
                             │      │      │
                    ┌────────┘      │      └────────┐
                    ▼               ▼               ▼
             ┌───────────┐  ┌───────────┐  ┌───────────┐
             │ tala-cli  │  │   weave   │  │    kai    │
             │ (query,   │  │ (replay,  │  │ (insight, │
             │  browse)  │  │  adapt)   │  │  predict) │
             └───────────┘  └───────────┘  └───────────┘

Layered Design

The architecture separates concerns into three layers, each with distinct runtime characteristics.

Capture Layer

Raw input enters through shell hooks (bash PROMPT_COMMAND, zsh precmd) or agent APIs. The capture layer is intentionally thin: it records the raw command string, execution context (working directory, environment hash, session ID), and timestamp, then forwards everything to talad over a Unix socket. No processing happens here. Latency budget: under 1 millisecond.

Processing Layer

talad orchestrates three operations on every incoming command:

Intent extraction (tala-intent). The raw command is tokenized, classified into a category (build, deploy, debug, configure, query, navigate), and enriched with context metadata. An embedding model generates a dense vector representation. The output is a fully structured Intent node.
Storage (tala-store). The intent is durably written to the WAL, inserted into the HNSW index for semantic search, and buffered in the hot store. When the buffer reaches capacity, it flushes to an immutable TBF segment on disk.
Graph construction (tala-graph). HNSW approximate nearest-neighbor search identifies semantically related intents. Edge candidates are re-ranked by exact cosine similarity, and the top-k connections are formed as weighted causal edges in the narrative DAG.

Query Layer

Consumers interact with the narrative graph through three interfaces:

tala-cli issues semantic search (tala find), temporal range queries, graph traversals (tala why), and narrative diffs (tala diff).
weave reads subgraphs and replays them, adapting commands to new environments and skipping idempotent steps.
kai runs background analysis: clustering failures by embedding proximity, detecting recurring narrative motifs, and generating next-action predictions.

Design Principles

The architecture enforces several invariants:

Intent is a first-class primitive. An intent is not a command wrapper. It is a structured representation of a desired outcome: I = f(C, X, P) where C is context, X is the action, and P is the expected postcondition. This representation survives across sessions, machines, and users.

DAG, not tree. The narrative graph is a directed acyclic graph with probabilistic edges. A single intent may have multiple causal parents (merge points) and multiple children (branch points). Tree structures cannot represent this.

Binary-first storage. TBF segments are columnar, SIMD-aligned, and zero-copy readable via mmap. No JSON serialization on the hot path.

Trait boundaries between subsystems. Every inter-crate interface is defined as a trait in tala-core. Concrete implementations live in their respective crates. This enforces that subsystems can be tested in isolation and swapped without cascading changes.

Append-only durability. Segments are immutable once flushed. The WAL provides crash recovery. No in-place mutation of on-disk data.

Crate Dependency Graph

TALA is structured as a Cargo workspace of 12 crates. The dependency graph is a strict DAG: dependencies flow downward from foundation types to application binaries. No cycles exist. cargo-deny enforces this invariant via [bans] configuration.

Dependency Hierarchy

                        tala-core
                       /    |    \
                      /     |     \
               tala-wire  tala-embed  (no inter-dependency)
                  |    \    /   |
                  |  tala-store |
                  |      |      |
             tala-graph  |  tala-intent
               /    \    |    /
        tala-weave  tala-kai
                  \   |   /
                tala-net
                    |
              tala-daemon
                    |
               tala-cli

Governing Invariants

No dependency cycles. The workspace resolver enforces this structurally. A crate may only depend on crates above it in the hierarchy.

Traits define boundaries. Inter-crate communication happens through trait objects and generic bounds defined exclusively in tala-core. No concrete types from sibling crates appear in public APIs. tala-store implements IntentStore; tala-embed provides the similarity functions consumed through the Embedder trait; tala-graph implements GraphEngine. Each crate can be compiled, tested, and benchmarked in isolation.

tala-core defines, others implement. All shared vocabulary types (Intent, Edge, IntentId, RelationType, TalaError) and all trait definitions (IntentStore, Embedder, GraphEngine, IntentExtractor, Transport) live in tala-core. Implementation code lives in the crate that owns the concern.

Crate Reference

Crate	Layer	Description
`tala-core`	Foundation	Types, traits, error taxonomy. Zero dependencies beyond `std`. Every other crate imports this.
`tala-wire`	Storage	TBF binary format reader/writer: columnar serialization, CSR edges, bloom filters, B+ tree index.
`tala-embed`	Compute	SIMD-accelerated vector operations (AVX2, NEON), HNSW approximate nearest-neighbor index, quantization (f32/f16/int8).
`tala-store`	Storage	Storage engine: WAL, hot buffer, segment lifecycle, HNSW-backed semantic query. Implements `IntentStore`.
`tala-graph`	Compute	Narrative graph: adjacency-list DAG, BFS/DFS traversal, edge formation, narrative extraction and diffing.
`tala-intent`	Pipeline	Intent extraction: command tokenization, classification, context assembly, embedding generation. Implements `IntentExtractor`.
`tala-weave`	Execution	Replay engine: DAG-ordered scheduling, environment adaptation, idempotency detection, failure recovery.
`tala-kai`	Analysis	Insight engine: failure clustering, pattern detection, predictive suggestions, narrative summarization.
`tala-net`	Network	Distributed mode: QUIC transport, SWIM membership, Raft consensus, partition-aware replication.
`tala-daemon`	Application	`talad` binary: Unix socket server, ingest pipeline orchestration, event hooks, metrics, health checks.
`tala-cli`	Application	`tala` binary: user-facing CLI (`find`, `replay`, `diff`, `why`, `stitch`, `status`), output formatting.
`xtask`	Tooling	Custom build tasks: formatting, linting, benchmarking, coverage, release packaging.

Async Runtime Boundaries

Not every crate uses async. The workspace draws a clear line:

Sync crates: tala-core, tala-wire, tala-embed, tala-graph. These are pure computation or I/O-free. They use rayon for parallelism where applicable, but no Tokio dependency.
Async crates: tala-daemon, tala-cli, tala-net, tala-weave. These depend on Tokio for I/O multiplexing, network transport, and concurrent task management.
Bridging: tala-store is structurally sync but used within talad's async context. ONNX Runtime and CUDA calls in tala-embed are blocking and dispatched via spawn_blocking.

tala-core remains runtime-agnostic. Its trait definitions use async fn via async-trait where necessary, but the crate itself has no runtime dependency.

Feature Flags

Feature flags control optional capabilities. Default features cover the common single-node case.

Crate	Feature	Default	Controls
`tala-wire`	`mmap`	yes	Memory-mapped segment access
`tala-wire`	`compression`	yes	lz4 and zstd segment compression
`tala-wire`	`encryption`	no	AES-256-GCM encrypted segments
`tala-embed`	`simd`	yes	SIMD vector operations
`tala-embed`	`cuda`	no	NVIDIA GPU acceleration
`tala-embed`	`vulkan`	no	Vulkan compute acceleration
`tala-embed`	`onnx`	yes	Local ONNX model inference
`tala-embed`	`remote-embed`	no	Remote embedding API
`tala-net`	`cluster`	no	Distributed mode

Dependency Versions

All shared dependency versions are declared in the workspace [workspace.dependencies] table. Individual crates reference them with { workspace = true }. This prevents version skew across the workspace and centralizes upgrade decisions.

Key external dependencies:

Dependency	Version	Used By
`tokio`	1.x	`tala-daemon`, `tala-cli`, `tala-net`, `tala-weave`
`uuid`	1.x (v7, serde)	`tala-core`
`thiserror`	2.x	`tala-core`
`memmap2`	0.9.x	`tala-wire`
`rayon`	1.x	`tala-embed`
`quinn`	0.11.x	`tala-net`
`criterion`	0.5.x	Benchmarks across all crates
`proptest`	1.x	Property tests

Data Flow: Ingest to Narrative

This chapter walks through the three primary data paths in TALA: the ingest pipeline that converts raw commands into connected intent nodes, the query path that retrieves them, and the replay path that re-executes them.

Ingest Pipeline

A raw command enters the system and emerges as a node in the narrative graph with causal edges to related intents. The pipeline has six stages.

Raw Command
    │
    ▼
┌──────────────────┐
│ IntentPipeline    │
│   .extract()      │  tala-intent
│                    │
│  tokenize          │
│  classify          │
│  embed             │
│  enrich            │
└────────┬───────────┘
         │
         ▼
    Intent { id, timestamp, embedding, raw_command, context_hash, ... }
         │
         ▼
┌──────────────────┐
│ StorageEngine     │
│   .insert()       │  tala-store
│                    │
│  ┌─ WAL.append()  │─────────▶ disk (durability)
│  │                 │
│  ├─ HNSW.insert() │─────────▶ in-memory ANN index
│  │                 │
│  └─ HotBuffer     │
│      .push()       │─────────▶ in-memory accumulator
│                    │
│  [if buffer full]  │
│  └─ flush()        │─────────▶ TBF segment on disk
└────────┬───────────┘
         │
         ▼
┌──────────────────┐
│ Edge Formation    │  tala-store + tala-graph
│                    │
│  HNSW.search()     │  find k*4 approximate neighbors
│      │             │
│      ▼             │
│  cosine re-rank    │  exact similarity on candidates
│      │             │
│      ▼             │
│  NarrativeGraph    │
│   .form_edges()    │  top-k become weighted causal edges
└────────────────────┘

Stage 1: Intent Extraction

IntentPipeline.extract() receives the raw command string and the execution context (working directory, environment hash, session ID, shell, user). It performs four operations:

Tokenization. The command is split into structural tokens: binary name, flags, arguments, pipes, redirections.
Classification. A classifier assigns the intent to a category: Build, Deploy, Debug, Configure, Query, Navigate, or Other.
Embedding. The command text is passed to an embedding model (ONNX Runtime for local inference, or a remote API) to produce a dense f32 vector of dimension 384. This vector encodes the semantic meaning of the command.
Enrichment. Metadata is attached: resource references (file paths, URLs, container names), estimated complexity, and a context hash (xxHash3 of the environment state).

The output is a fully populated Intent struct.

Stage 2: WAL Append

StorageEngine.insert() begins by writing the intent to the write-ahead log. The WAL entry format is:

[4B payload_len][16B id][8B timestamp][4B embed_len][embed bytes][4B cmd_len][cmd bytes]

The WAL is fsync'd after each append (batched in practice). This guarantees that the intent survives a crash even if subsequent steps fail.

Stage 3: HNSW Insert

The intent's embedding vector is inserted into the in-memory HNSW index. This makes it immediately discoverable by semantic search. The HNSW index maintains a multi-layer navigable small-world graph with parameters M=16 (edges per node per layer) and ef_construction=200 (search width during insertion).

An index_map (a Vec<IntentId> indexed by HNSW position) maintains the mapping between HNSW node indices and IntentId values.

Stage 4: Hot Buffer Push

The intent's columnar fields (id, timestamp, context hash, confidence, status, embedding) are pushed into the HotBuffer. This is an in-memory accumulator that collects intents until it reaches its configured capacity.

Stage 5: Segment Flush

When the hot buffer reaches capacity (default: 64K nodes or 256MB), it flushes to a TBF segment:

The SegmentWriter receives all buffered intents.
Node fields are serialized as columnar arrays, each 64-byte aligned.
Embeddings are written to the embedding region with 64-byte stride alignment.
Edges are encoded as a CSR index (row pointer array + edge entry array).
A bloom filter over node UUIDs is constructed (1% false positive rate).
The 128-byte TBF header is written with region offsets, counts, and flags.
The segment file is written to disk and fsync'd.
The hot buffer is cleared.

The segment is now immutable. It will never be modified in place.

Stage 6: Edge Formation

After the intent is stored, the system identifies causal connections to existing intents. This is where the flat log becomes a graph.

Approximate search. QueryEngine.find_edge_candidates() queries the HNSW index with the new intent's embedding, requesting k * 4 approximate nearest neighbors. HNSW returns candidates in O(log n) time, replacing what would otherwise be an O(n^2) brute-force scan.
Exact re-ranking. Each candidate's stored embedding is retrieved, and exact cosine similarity is computed using SIMD-accelerated operations. The candidates are sorted by descending similarity.
Edge creation. The top-k candidates (by exact cosine similarity) become edges in the NarrativeGraph. Each edge is typed as Causal and weighted by the similarity score. NarrativeGraph.form_edges() inserts these as directed edges from the existing node to the new node, maintaining both forward and backward adjacency lists.

Query Path

Queries enter through tala-cli and are dispatched to StorageEngine methods based on query type.

Semantic Search (`tala find`)

Query embedding
    │
    ▼
StorageEngine.query_semantic(embedding, k)
    │
    ├─ HNSW.search(embedding, k, ef=50)
    │     └─ returns [(index, l2_distance)]
    │
    ├─ index_map lookup: index → IntentId
    │
    └─ cosine re-rank: exact similarity on HNSW candidates
         └─ returns [(IntentId, similarity)]

The two-phase approach (approximate HNSW search followed by exact re-ranking) balances speed and accuracy. HNSW search at ef=50 over a 10K corpus completes in approximately 139 microseconds. The re-ranking step is negligible because it only operates on k candidates.

Temporal Range Query (`tala status`, time-based filters)

StorageEngine.query_temporal(TimeRange { start, end })
    │
    └─ Full scan of in-memory intent HashMap
         └─ Filter by timestamp range
              └─ Sort by timestamp ascending

Graph Traversal (`tala why`)

NarrativeGraph.bfs_backward(intent_id, max_depth)
    │
    └─ BFS over backward adjacency lists
         └─ Returns causal predecessors up to max_depth hops

Backward traversal from a failure event surfaces its causal chain: what preceded it, what triggered it, and which earlier intents contributed to the state that caused it to fail.

Narrative Extraction (`tala diff`, `tala stitch`)

NarrativeGraph.extract_narrative(root, max_depth)
    │
    └─ Bidirectional BFS from root
         └─ Returns (visited_nodes, edges) — the connected subgraph

This extracts a coherent narrative: the subgraph of all intents reachable from a root node in both forward and backward directions, bounded by depth.

Replay Path

Replay reads a narrative subgraph and re-executes it in a new context. This is the tala replay path through tala-weave.

Narrative subgraph (nodes + edges)
    │
    ▼
┌──────────────────┐
│ Planner           │
│                    │
│  Topological sort  │  Respect edge dependencies
│  of intent nodes   │
└────────┬───────────┘
         │
         ▼
┌──────────────────┐
│ Transform         │
│                    │
│  Adapt commands    │  Apply environment deltas
│  to current state  │  (different cwd, env vars, paths)
└────────┬───────────┘
         │
         ▼
┌──────────────────┐
│ Executor          │
│                    │
│  For each step:    │
│  ┌─ Idempotency   │  Skip if already satisfied
│  │   check         │
│  ├─ Execute        │  Sandboxed, with timeout
│  └─ Record outcome │  Attach to intent node
│                    │
│  On failure:       │
│  ├─ Retry          │  Configurable attempts
│  ├─ Skip           │  Mark and continue
│  └─ Abort          │  Halt replay
└────────────────────┘

The planner performs a topological sort of the narrative subgraph to determine execution order. Edges encode dependencies: if intent A has a causal edge to intent B, A must complete before B begins.

The transform step adapts commands to the current environment. If the original narrative was recorded in /home/alice/project but is being replayed in /home/bob/project, path references are rewritten. Environment variable differences are detected via context hash comparison.

The executor runs each command in a sandboxed subprocess with a configurable timeout. Before execution, it checks whether the command's postcondition is already satisfied (idempotency detection). If so, the step is skipped. After execution, the outcome (status, latency, exit code) is attached back to the intent node via StorageEngine.attach_outcome().

Storage Engine

The storage engine manages the lifecycle of intent data from initial capture through durable persistence. It is built on three layers, each optimized for a different phase of the data lifecycle, unified by the StorageEngine struct that implements the IntentStore trait.

Three-Layer Architecture

┌────────────────────────────────────────────────────────┐
│                    StorageEngine                        │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │  WAL (Write-Ahead Log)                           │   │
│  │  Sequential append, fsync per batch              │   │
│  │  Crash recovery: replay into new segment         │   │
│  └──────────────────┬───────────────────────────────┘   │
│                     │                                    │
│  ┌──────────────────▼───────────────────────────────┐   │
│  │  Hot Buffer (in-memory accumulator)              │   │
│  │  Columnar layout, capacity-triggered flush       │   │
│  │  Serves real-time queries from memory            │   │
│  └──────────────────┬───────────────────────────────┘   │
│                     │ flush at capacity                  │
│  ┌──────────────────▼───────────────────────────────┐   │
│  │  Segments (immutable TBF files)                  │   │
│  │  Columnar nodes, aligned embeddings, CSR edges   │   │
│  │  Zero-copy mmap access, append-only              │   │
│  └──────────────────────────────────────────────────┘   │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │  HNSW Index (in-memory ANN)                      │   │
│  │  Semantic search across all stored intents       │   │
│  │  Parallel to WAL/Hot/Segment — always up to date │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Write-Ahead Log

The WAL provides durability. Every intent is appended to the WAL before any other operation. If the process crashes mid-insert, the WAL can be replayed to recover all committed intents.

Entry Format

Each WAL entry is a length-prefixed binary record:

Field	Size	Description
`payload_len`	4 bytes	Total payload size (excludes this field)
`id`	16 bytes	Intent UUID
`timestamp`	8 bytes	Nanosecond epoch
`embed_len`	4 bytes	Number of embedding dimensions
`embedding`	`embed_len * 4` bytes	f32 vector
`cmd_len`	4 bytes	Command string length
`raw_command`	`cmd_len` bytes	UTF-8 command text

The WAL uses a BufWriter with a 64KB buffer to amortize syscall overhead. After each append, flush() ensures data reaches the kernel buffer. The fsync policy is configurable: per-entry for maximum durability, or per-batch for throughput.

Recovery

On startup, replay_wal() reads the WAL file sequentially, parsing entries until EOF or a truncated record (which indicates a crash mid-write). Truncated entries are discarded. All valid entries are re-ingested into the hot buffer and HNSW index.

Startup
    │
    ├─ WAL file exists?
    │     │
    │     ▼ yes
    │   replay_wal() → Vec<WalEntry>
    │     │
    │     ▼
    │   Re-insert each entry into HotBuffer + HNSW
    │     │
    │     ▼
    │   Truncate WAL (entries are now in hot buffer)
    │
    └─ no → fresh start

Hot Buffer

The hot buffer is an in-memory accumulator that collects intents in columnar layout until flushed to a persistent TBF segment. It serves two purposes: batching writes for efficient segment construction, and providing immediate query access to recently ingested intents.

Structure

Internally, the hot buffer stores each intent's fields decomposed into parallel vectors:

id: [u8; 16] -- UUID bytes
timestamp: u64 -- nanosecond epoch
context_hash: u64 -- xxHash3 of execution context
confidence: f32 -- extraction confidence score
status: u8 -- outcome status enum
embedding: Vec<f32> -- dense vector
parent_indices: Vec<usize> -- indices of parent nodes (for edge construction)

This columnar decomposition mirrors the on-disk TBF layout and enables direct serialization without restructuring.

Flush Cycle

The buffer has a configurable capacity (default: 64K nodes or 256MB). When a push() fills the buffer to capacity, it returns true, signaling the StorageEngine to initiate a flush.

The flush process:

A SegmentWriter is constructed with the buffer's embedding dimensionality.
Each buffered intent is pushed as a node, and parent edges are added.
SegmentWriter.finish() serializes all regions (columnar nodes, aligned embeddings, CSR edges, bloom filter) into a contiguous byte buffer.
The byte buffer is written to disk as a numbered .tbf file (e.g., segment_000001.tbf).
The hot buffer is cleared.

After the flush, the segment is immutable. Subsequent reads of those intents come from the segment via mmap (in production) or from the in-memory intent store (in the current implementation).

Metrics

The storage engine tracks detailed per-operation metrics via atomic counters:

Metric	Description
`wal_append_ns`	Cumulative WAL append time
`hnsw_insert_ns`	Cumulative HNSW insert time
`hot_push_ns`	Cumulative hot buffer push time
`segment_flush_ns`	Cumulative segment write time
`total_bytes_flushed`	Total bytes written to segments
`segments_flushed_count`	Number of segments produced

Lock contention is monitored per lock (WAL, HNSW, hot buffer, intents, index map) with acquisition count, wait time, hold time, and worst-case values.

Segments

Segments are immutable TBF binary files produced by flushing the hot buffer. See the TBF Binary Format chapter for the complete segment structure.

Key properties:

Immutable. Once written, a segment is never modified. This eliminates write-write conflicts, simplifies concurrency, and enables lock-free reads via mmap.
Self-contained. Each segment includes its own bloom filter (for fast negative lookups), B+ tree index (for point lookups), and CSR edge index (for graph traversal). No external index is required to read a segment.
Tiered. Segments progress through temperature tiers: hot (in-memory buffer), warm (mmap'd on NVMe), cold (compressed, possibly quantized, on bulk storage).

Compaction

Over time, many small segments accumulate. Compaction merges them:

Input segments are opened via mmap.
Nodes are merge-sorted by timestamp.
The CSR edge index is rebuilt from the merged node set.
The string table is deduplicated across segments.
Bloom filter and B+ tree index are reconstructed.
A new compacted segment is written with the is_compacted flag set.
Segment references are atomically swapped.
Input segments are unlinked.

Compaction also applies zstd compression to the string table. Embedding regions are never compressed because they require random access for SIMD similarity operations.

The StorageEngine Struct

StorageEngine unifies all three layers behind the IntentStore trait. It uses interior mutability (Mutex and RwLock) to satisfy Send + Sync requirements for concurrent access from talad's async runtime.

#![allow(unused)]
fn main() {
pub struct StorageEngine {
    dim: usize,
    intents: RwLock<HashMap<IntentId, Intent>>,  // concurrent reads
    hnsw: Mutex<HnswIndex>,                       // search mutates visit state
    index_map: RwLock<Vec<IntentId>>,              // HNSW index → IntentId
    wal: Option<Mutex<Wal>>,                       // durability (None = in-memory mode)
    hot: Mutex<HotBuffer>,                         // accumulator
    segment_dir: PathBuf,                          // segment output directory
    segment_seq: Mutex<u64>,                       // monotonic segment counter
    metrics: Arc<StorageMetrics>,                  // lock and pipeline metrics
}
}

The insert() path acquires locks in a fixed order to prevent deadlock: WAL, then HNSW, then index map, then hot buffer, then intent store. The intent store is updated last because it represents the point at which the insert becomes visible to queries -- durability (WAL) and indexing (HNSW) must complete first.

The engine can operate in two modes:

Persistent mode (StorageEngine::open(dim, dir, capacity)): WAL and segments are written to the specified directory. Full crash recovery is available.
In-memory mode (StorageEngine::in_memory(dim, capacity)): No WAL, no segment persistence. Segment flushes are discarded. Suitable for testing and benchmarking.

Query Engine

The QueryEngine provides HNSW-backed semantic search with exact cosine re-ranking.

Semantic Search

query_semantic(embedding, k) performs a two-phase search:

HNSW approximate search. The query embedding is searched against the HNSW index with ef = max(k, 50). This returns k approximate nearest neighbors by L2 distance in O(log n) time.
Cosine re-ranking. Each candidate's stored embedding is retrieved from the HNSW node, and exact cosine similarity is computed using SIMD-dispatched operations. Results are returned as (IntentId, similarity) pairs.

For a 10K corpus with ef=50 and k=10, end-to-end query latency is approximately 151 microseconds.

Edge Candidate Search

find_edge_candidates(embedding, k) is a specialized search used during edge formation. It requests k * 4 candidates from HNSW (to increase recall), computes exact cosine similarity on all candidates, sorts by descending similarity, and returns the top-k. This provides the high-quality similarity scores needed for edge weighting while avoiding the O(n^2) cost of comparing the new intent against every existing intent.

Temporal Query

query_temporal(TimeRange) performs a full scan of the in-memory intent store, filtering by timestamp range and returning results sorted by time. This is acceptable for the current scale; at larger corpus sizes, a B+ tree index over timestamps within segments will replace the scan.

The TBF Binary Format

TBF (TALA Binary Format) is the on-disk segment format for intent data. It is purpose-built for TALA's hybrid graph-plus-vector data model, co-designed for zero-copy memory-mapped access, SIMD-aligned embedding vectors, and graph adjacency traversal without deserialization.

TBF replaces general-purpose formats (Parquet, Arrow IPC, FlatBuffers) because none of them provide graph-native layout alongside SIMD-aligned vector storage. The cost is a format that only TALA reads and writes. The benefit is that every access pattern TALA cares about -- columnar field scans, embedding similarity, edge traversal, membership tests -- is optimized at the byte level.

Segment Structure

A single .tbf segment file is laid out as follows:

Offset 0           ┌──────────────────────────────────┐
                    │        File Header (128B)        │
                    │                                  │
                    │  magic: "TALB"                   │
                    │  version, counts, region offsets │
Offset 128         ├──────────────────────────────────┤
                    │      Node Payload Region         │
                    │    (columnar, 64B-aligned)       │
                    │                                  │
                    │  [ids]  [timestamps]  [hashes]   │
                    │  [confidences]  [statuses]       │
                    ├──────────────────────────────────┤
                    │    Embedding Vector Region        │
                    │   (fixed-stride, 64B-aligned)    │
                    │                                  │
                    │  [vec_0] [pad] [vec_1] [pad] ... │
                    ├──────────────────────────────────┤
                    │        Edge Region (CSR)          │
                    │                                  │
                    │  [row_pointers]  [edge_entries]  │
                    ├──────────────────────────────────┤
                    │       Bloom Filter Block          │
                    │                                  │
                    │  [num_hashes] [num_bits] [bits]  │
                    └──────────────────────────────────┘

Every region boundary is padded to a 64-byte boundary. This guarantees that SIMD loads (_mm512_load_ps for AVX-512, _mm256_load_ps for AVX2) never straddle region or column boundaries.

File Header

The header occupies the first 128 bytes and is laid out for direct mmap casting via #[repr(C, packed)]:

Offset	Size	Field	Description
0	4	`magic`	`0x54414C42` ("TALB")
4	2	`version_major`	Format major version
6	2	`version_minor`	Format minor version
8	8	`segment_id`	Monotonic segment identifier
16	8	`created_at`	Unix timestamp (nanoseconds)
24	8	`node_count`	Number of intent nodes
32	8	`edge_count`	Number of edges
40	4	`embedding_dim`	Vector dimensionality
44	1	`embedding_dtype`	0=f32, 1=f16, 2=bf16, 3=int8
45	1	`compression`	0=none, 1=lz4, 2=zstd
46	2	`flags`	Bit flags (see below)
48	8	`node_region_offset`	Byte offset to node payload region
56	8	`embed_region_offset`	Byte offset to embedding region
64	8	`edge_region_offset`	Byte offset to edge region
72	8	`string_table_offset`	Byte offset to string table
80	8	`bloom_offset`	Byte offset to bloom filter
88	8	`index_offset`	Byte offset to B+ tree index
96	8	`footer_offset`	Byte offset to footer
104	4	`checksum`	CRC32C of header bytes [0..104)
108	20	`reserved`	Zero-padded

Flags

Bit	Name	Meaning
0	`has_outcomes`	Outcome fields are populated
1	`has_embeddings`	Embedding region is present
2	`is_compacted`	Segment is the result of compaction
3	`is_encrypted`	Payload regions are AES-256-GCM encrypted
4	`is_partial`	Segment was not cleanly closed (crash indicator)

A reader encountering is_partial = 1 knows the segment is incomplete. Recovery discards it and replays the WAL.

Node Payload Region

Node fields are stored in columnar groups. Each column contains values for all N nodes, stored contiguously. This layout enables vectorized scans over a single field without pulling unrelated data into cache.

Column Layout

Column	Element Size	Content
`id`	16 bytes	UUID (128-bit, big-endian)
`timestamp`	8 bytes	u64 nanosecond epoch
`context_hash`	8 bytes	u64 xxHash3
`confidence`	4 bytes	f32 extraction confidence
`outcome_status`	1 byte	u8 enum: 0=pending, 1=success, 2=failure, 3=partial

Each column start is padded to the next 64-byte boundary. For a segment with 100K nodes, the timestamp column occupies 800KB of contiguous memory -- a single field scan touches only those 800KB, not the full node data.

The ColumnReader provides zero-copy typed access. Given a column offset and a node index, it computes the byte position and interprets the value directly from the underlying buffer:

#![allow(unused)]
fn main() {
// Read the timestamp of node i
let ts = reader.read_u64(timestamp_col_offset, i);
// Computes: offset + i * 8, reads 8 bytes as little-endian u64
}

No allocation. No deserialization struct. Pointer arithmetic and a byte reinterpretation.

Embedding Vector Region

The most performance-critical region. Embedding vectors are stored contiguously with fixed stride, aligned to 64-byte boundaries.

Why 64-Byte Alignment

SIMD instruction sets impose alignment requirements on memory operands:

ISA	Load Instruction	Required Alignment
SSE	`_mm_load_ps`	16 bytes
AVX2	`_mm256_load_ps`	32 bytes
AVX-512	`_mm512_load_ps`	64 bytes
NEON	`vld1q_f32`	16 bytes (recommended)

TALA standardizes on 64 bytes to satisfy all tiers. This means AVX-512 loads can use the aligned variant (_mm512_load_ps) instead of the unaligned variant (_mm512_loadu_ps), avoiding a potential penalty on some microarchitectures.

Stride Calculation

The stride is the vector size rounded up to the next 64-byte boundary:

stride = align_up(dim * sizeof(dtype), 64)

For the default configuration (dim=384, dtype=f32):

Vector size: 384 x 4 = 1,536 bytes
1,536 / 64 = 24 (already aligned)
Stride: 1,536 bytes

For dim=384, dtype=f16:

Vector size: 384 x 2 = 768 bytes
768 / 64 = 12 (already aligned)
Stride: 768 bytes

Access Pattern

Given node index i, the embedding is at byte offset:

embed_region_offset + (i * stride)

This is O(1) with zero deserialization. An mmap'd file pointer plus offset arithmetic is all that is needed. The EmbeddingReader wraps this:

#![allow(unused)]
fn main() {
pub fn get(&self, index: usize) -> &[f32] {
    let offset = index * self.stride;
    // SAFETY: offset is within bounds, alignment guaranteed by stride
    unsafe {
        std::slice::from_raw_parts(
            self.data[offset..].as_ptr() as *const f32,
            self.dim,
        )
    }
}
}

Quantization

The embedding_dtype header field determines how bytes in the embedding region are interpreted:

Value	Type	Bytes/Dim	Use Case
0	f32	4	Hot store, real-time queries
1	f16	2	Warm store, 2x compression
2	bf16	2	Training-compatible
3	int8	1	Cold store, 4x compression

For int8, each vector carries an 8-byte header (f32 scale, f32 zero_point) for dequantization. Quality loss at int8 is minimal: Spearman rank correlation of similarity rankings versus f32 baseline is 0.990-0.995.

Edge Region (CSR)

Edges use Compressed Sparse Row encoding, optimized for forward traversal: given a node, find all its outgoing edges.

Structure

The edge region contains two arrays:

Row pointer array: (N+1) entries of u64, where N is the node count. row_pointers[i] gives the index into the edge array where node i's edges begin. row_pointers[i+1] - row_pointers[i] gives node i's out-degree.

Edge entry array: E entries, where each entry is:

Field	Size	Description
`target`	4 bytes	u32 index of destination node
`relation`	1 byte	u8 enum: 0=causal, 1=temporal, 2=dependency, 3=retry, 4=branch
`weight`	4 bytes	f32 confidence
`padding`	3 bytes	Zero (alignment to 12 bytes)

To find all edges from node i:

#![allow(unused)]
fn main() {
let start = row_pointers[i] as usize;
let end = row_pointers[i + 1] as usize;
let edges = &edge_array[start..end];
}

This is a single index lookup and a slice operation. No linked-list traversal, no hash table probe.

Reverse Index

For backward traversal (find all nodes pointing to a given node), a CSC (Compressed Sparse Column) block can be appended after the CSR block, using the same format but transposed. Its presence is indicated by a flag in the edge region header.

A bloom filter over node UUIDs provides fast negative lookups: "Is this intent definitely not in this segment?" This is critical for query routing across multiple segments. Before performing a full B+ tree lookup or column scan, the bloom filter rejects segments that cannot contain the target node.

Layout

Field	Size	Description
`num_hashes`	4 bytes	Number of hash functions
`num_bits`	8 bytes	Total bit count
`bits`	`ceil(num_bits/64) * 8` bytes	Filter data as u64 words

Target false positive rate: 1%. For 100K nodes, this requires approximately 120KB. The filter uses double hashing (FNV-1a based) with k hash functions derived from two base hashes.

Why Binary-First

TALA stores intent data in a custom binary format rather than JSON, JSONL, or other text formats. The reasons are quantitative:

Density. A single intent with a 384-dimensional f32 embedding occupies 1,536 bytes for the embedding alone. In JSON, each float becomes 6-8 characters of text plus separators, inflating to approximately 4,000 bytes. Over millions of intents, this difference is the gap between fitting in memory and not.

Zero-copy access. An mmap'd TBF segment exposes node fields and embeddings directly as typed memory. A JSON file must be parsed, decoded, and allocated into structs before any field can be read.

SIMD compatibility. JSON floats are ASCII text. Computing cosine similarity over JSON-encoded vectors requires parsing every float before any math can happen. TBF embeddings are already in the format that SIMD instructions consume: contiguous f32 arrays at 64-byte alignment.

Graph traversal. The CSR edge index enables O(1) lookup of a node's neighbors. In JSON, edges would be embedded in node objects or stored in a separate array requiring linear scan.

Why Columnar

Row-oriented storage (one struct per intent, fields interleaved) is natural for single-record access but wasteful for analytical queries. When tala find needs to scan 100K timestamps to identify a time range, row-oriented layout forces it to pull every field of every intent into cache, even though only the 8-byte timestamp is needed.

Columnar layout stores each field as a contiguous array. A timestamp scan touches 800KB (100K x 8 bytes) instead of the full record size (potentially 2KB+ per intent including the embedding). This matches the access pattern that SIMD vectorization exploits: uniform arrays processed in bulk.

Comparison with Existing Formats

Property	TBF	Parquet	Arrow IPC	FlatBuffers
Graph-native layout (CSR)	Yes	No	No	No
SIMD-aligned embeddings	Yes (64B)	No	Yes (64B)	No
Zero-copy mmap	Yes	Partial	Yes	Yes
Columnar node fields	Yes	Yes	Yes	No
Built-in bloom filter	Yes	Optional	No	No
Built-in graph index	Yes (B+ tree + CSR)	No	No	No
Append-only segments	Yes	Yes	No	No
Per-region compression	Yes	Per-column	No	No

tala-core

Foundation types and traits for the intent-native narrative execution layer. Every crate in the TALA workspace depends on tala-core; it defines the vocabulary that all subsystems share. The crate carries no dependencies beyond uuid and thiserror, making it safe to link from any context -- SIMD kernels, storage engines, network codecs, and CLI tools alike.

Key Types

Type	Description
`IntentId`	UUID wrapper identifying a single intent node
`Intent`	Full intent node: identity, timestamp, embedding, command, outcome
`Edge`	Directed edge between two intent nodes
`RelationType`	Edge classification enum (Causal, Temporal, Dependency, Retry, Branch)
`Outcome`	Execution result: status, latency, exit code
`Status`	Outcome status enum (Pending, Success, Failure, Partial)
`Context`	Execution environment captured at intent time
`TimeRange`	Inclusive-start, exclusive-end nanosecond time window
`ReplayStep`	A single step in a replay plan
`ReplayResult`	Result of executing a replay step
`IntentCategory`	Classification of an intent's purpose
`Insight`	An observation produced by the analysis engine
`InsightKind`	Classification of insight types
`TalaError`	Unified error taxonomy for the workspace
`IntentStore`	Trait: storage abstraction for intents
`IntentExtractor`	Trait: raw command to structured intent conversion

IntentId

A newtype over uuid::Uuid providing a unique, opaque handle for every intent node in the system.

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct IntentId(pub Uuid);
}

Methods

#![allow(unused)]
fn main() {
impl IntentId {
    /// Generate a random v4 UUID.
    pub fn random() -> Self;

    /// Access the raw 16-byte representation.
    pub fn as_bytes(&self) -> &[u8; 16];
}
}

IntentId implements Default by generating a random identifier. Two separately constructed IntentId values are virtually guaranteed to be distinct.

#![allow(unused)]
fn main() {
let id = IntentId::random();
assert_eq!(id.as_bytes().len(), 16);
}

Intent

The central data structure of the entire system. An Intent captures everything known about a single user action: its identity, when it occurred, the raw command text, a dense embedding vector for semantic search, and the optional execution outcome.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Intent {
    pub id: IntentId,
    pub timestamp: u64,
    pub raw_command: String,
    pub embedding: Vec<f32>,
    pub context_hash: u64,
    pub parent_ids: Vec<IntentId>,
    pub outcome: Option<Outcome>,
    pub confidence: f32,
}
}

Field	Description
`id`	Unique identifier assigned at creation
`timestamp`	Nanosecond epoch when the intent was captured
`raw_command`	The original shell command string
`embedding`	Dense f32 vector (typically dim=384) for semantic similarity
`context_hash`	FNV-1a hash of the execution context
`parent_ids`	Predecessor intent IDs forming causal chains
`outcome`	Execution result, attached asynchronously
`confidence`	Extraction confidence score in `[0.0, 1.0]`

Edge

A directed, weighted edge connecting two intent nodes. Edges carry a RelationType that classifies the nature of the relationship and a floating-point weight representing connection strength.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Edge {
    pub from: IntentId,
    pub to: IntentId,
    pub relation: RelationType,
    pub weight: f32,
}
}

RelationType

Classifies how two intent nodes relate. Stored as a #[repr(u8)] enum so it can be serialized into a single byte in the TBF binary format and CSR edge entries.

#![allow(unused)]
fn main() {
#[repr(u8)]
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum RelationType {
    Causal     = 0,
    Temporal   = 1,
    Dependency = 2,
    Retry      = 3,
    Branch     = 4,
}
}

Variant	Meaning
`Causal`	The source intent directly caused the target
`Temporal`	The two intents are temporally adjacent
`Dependency`	The target depends on the source's output
`Retry`	The target is a retry of the source
`Branch`	The target is an alternative path from the source

Outcome and Status

An Outcome records the result of executing an intent. It is attached to an Intent asynchronously after execution completes.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Outcome {
    pub status: Status,
    pub latency_ns: u64,
    pub exit_code: i32,
}
}

#![allow(unused)]
fn main() {
#[repr(u8)]
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum Status {
    Pending = 0,
    Success = 1,
    Failure = 2,
    Partial = 3,
}
}

Context

Captures the execution environment at intent time. Used to compute the context_hash field on Intent, enabling queries that group intents by environment.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug, Default)]
pub struct Context {
    pub cwd: String,
    pub env_hash: u64,
    pub session_id: u64,
    pub shell: String,
    pub user: String,
}
}

TimeRange

A half-open time interval [start, end) expressed in nanosecond epoch timestamps. Used by IntentStore::query_temporal to retrieve intents within a window.

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, Debug)]
pub struct TimeRange {
    pub start: u64,
    pub end: u64,
}
}

ReplayStep and ReplayResult

Types used by the replay engine (tala-weave) to represent plan steps and their outcomes.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct ReplayStep {
    pub intent_id: IntentId,
    pub command: String,
    pub deps: Vec<IntentId>,
}

#[derive(Clone, Debug)]
pub struct ReplayResult {
    pub step: ReplayStep,
    pub outcome: Outcome,
    pub skipped: bool,
}
}

ReplayStep is a single entry in a topologically sorted replay plan. deps lists the IDs of steps that must complete before this step may execute. ReplayResult wraps a step with its execution outcome and a flag indicating whether the step was skipped due to idempotency.

IntentCategory and Insight Types

Classification types for the intent pipeline and insight engine.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum IntentCategory {
    Build,
    Deploy,
    Debug,
    Configure,
    Query,
    Navigate,
    Other(String),
}
}

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Insight {
    pub kind: InsightKind,
    pub description: String,
    pub intent_ids: Vec<IntentId>,
    pub confidence: f32,
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum InsightKind {
    RecurringPattern,
    FailureCluster,
    Prediction,
    Summary,
}
}

TalaError

A unified error taxonomy for the entire workspace. Crate-specific errors in downstream crates derive From conversions into TalaError via thiserror.

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
pub enum TalaError {
    #[error("segment not found: {0:?}")]
    SegmentNotFound(SegmentId),

    #[error("segment corrupted: {0}")]
    SegmentCorrupted(String),

    #[error("node not found: {0:?}")]
    NodeNotFound(IntentId),

    #[error("dimension mismatch: expected {expected}, got {got}")]
    DimensionMismatch { expected: usize, got: usize },

    #[error("extraction failed: {0}")]
    ExtractionFailed(String),

    #[error("cycle detected in narrative graph")]
    CycleDetected,

    #[error(transparent)]
    Io(#[from] std::io::Error),
}
}

The SegmentId type referenced by SegmentNotFound is a monotonic segment identifier:

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub struct SegmentId(pub u64);
}

IntentStore Trait

The core storage abstraction. Implemented by tala_store::StorageEngine. All methods accept &self and use interior mutability (locks) so the store can be shared across threads.

#![allow(unused)]
fn main() {
pub trait IntentStore: Send + Sync {
    /// Insert a fully formed intent. Returns its ID.
    fn insert(&self, intent: Intent) -> Result<IntentId, TalaError>;

    /// Retrieve an intent by ID. Returns `None` if not found.
    fn get(&self, id: IntentId) -> Result<Option<Intent>, TalaError>;

    /// Semantic search: find the `k` nearest intents by embedding similarity.
    /// Returns (IntentId, cosine_similarity) pairs.
    fn query_semantic(
        &self,
        embedding: &[f32],
        k: usize,
    ) -> Result<Vec<(IntentId, f32)>, TalaError>;

    /// Temporal query: return all intents within a time range, sorted by timestamp.
    fn query_temporal(&self, range: TimeRange) -> Result<Vec<Intent>, TalaError>;

    /// Attach an outcome to an existing intent.
    fn attach_outcome(&self, id: IntentId, outcome: Outcome) -> Result<(), TalaError>;
}
}

IntentExtractor Trait

Converts a raw command string and execution context into a structured Intent. Implemented by tala_intent::IntentPipeline.

#![allow(unused)]
fn main() {
pub trait IntentExtractor: Send + Sync {
    fn extract(&self, raw: &str, context: &Context) -> Result<Intent, TalaError>;
}
}

tala-wire

The TALA Binary Format (TBF) crate. Implements on-disk segment serialization with columnar node storage, CSR edge indexing, 64-byte aligned embedding regions, and bloom filter membership tests. Every data structure in this crate is designed for zero-copy read access and SIMD-friendly alignment.

Key Types

Type	Description
`BloomFilter`	Probabilistic membership test using FNV-1a double hashing
`CsrBuilder`	Builder for constructing a CSR edge index
`CsrIndex`	Compressed Sparse Row edge index (read-only)
`CsrEdge`	A single edge entry: target node, relation type, weight
`ColumnarBuffer`	In-memory columnar storage for intent node fields
`ColumnReader`	Zero-copy typed access to serialized column data
`EmbeddingWriter`	Writes 64-byte aligned embedding vectors
`EmbeddingReader`	Zero-copy read access to aligned embedding vectors
`SegmentWriter`	Orchestrates all regions into a complete TBF segment
`SegmentReader`	Parses and validates a TBF segment from bytes
`SegmentHeader`	Parsed header fields of a TBF segment

Constants

#![allow(unused)]
fn main() {
/// Magic bytes identifying a TBF segment: "TALB".
pub const MAGIC: [u8; 4] = *b"TALB";

/// Fixed header size in bytes.
pub const HEADER_SIZE: usize = 128;

/// Alignment boundary in bytes for all regions and columns.
pub const ALIGN: usize = 64;
}

Utility Functions

#![allow(unused)]
fn main() {
/// Round `value` up to the next multiple of `alignment`.
#[inline]
pub fn align_up(value: usize, alignment: usize) -> usize;
}

BloomFilter

A probabilistic set membership test backed by a bit vector of u64 words. Uses FNV-1a double hashing to produce num_hashes independent bit positions per key. The filter is sized automatically from expected item count and desired false positive rate.

#![allow(unused)]
fn main() {
pub struct BloomFilter {
    pub bits: Vec<u64>,
    pub num_hashes: u32,
    pub num_bits: u64,
}
}

Methods

#![allow(unused)]
fn main() {
impl BloomFilter {
    /// Create a new filter sized for `expected_items` with the target `fp_rate`.
    /// The number of bits and hash functions are computed from the optimal formulas.
    pub fn new(expected_items: usize, fp_rate: f64) -> Self;

    /// Insert a key into the filter.
    pub fn insert(&mut self, key: &[u8]);

    /// Test whether a key is possibly in the set. False positives are possible;
    /// false negatives are not.
    pub fn contains(&self, key: &[u8]) -> bool;

    /// Return the size of the bit vector in bytes.
    pub fn size_bytes(&self) -> usize;
}
}

Example

#![allow(unused)]
fn main() {
use tala_wire::BloomFilter;

let mut bloom = BloomFilter::new(1000, 0.01);
bloom.insert(b"intent-abc");
assert!(bloom.contains(b"intent-abc"));
}

CsrEdge

A single edge entry in the CSR index. Stored compactly with a 4-byte target node index, a 1-byte relation type tag, and a 4-byte weight. On disk each entry occupies 12 bytes (with 3 bytes padding).

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct CsrEdge {
    pub target: u32,
    pub relation: u8,
    pub weight: f32,
}
}

CsrBuilder and CsrIndex

The Compressed Sparse Row representation stores edges in a flat array with a row-offset array for O(1) adjacency lookup per node.

CsrBuilder

#![allow(unused)]
fn main() {
pub struct CsrBuilder { /* private */ }

impl CsrBuilder {
    /// Create a builder for a graph with `node_count` nodes.
    pub fn new(node_count: usize) -> Self;

    /// Add a directed edge from node `from` to node `to`.
    pub fn add_edge(&mut self, from: usize, to: usize, relation: u8, weight: f32);

    /// Consume the builder and produce a read-only `CsrIndex`.
    pub fn build(self) -> CsrIndex;
}
}

CsrIndex

#![allow(unused)]
fn main() {
pub struct CsrIndex { /* private */ }

impl CsrIndex {
    /// Return all edges originating from `node`. O(1) lookup via row offsets.
    pub fn edges_from(&self, node: usize) -> &[CsrEdge];

    /// Return the out-degree of `node`.
    pub fn degree(&self, node: usize) -> usize;

    /// Return the total number of edges in the index.
    pub fn edge_count(&self) -> usize;

    /// Return the number of nodes in the index.
    pub fn node_count(&self) -> usize;
}
}

Example

#![allow(unused)]
fn main() {
use tala_wire::{CsrBuilder, CsrEdge};

let mut builder = CsrBuilder::new(4);
builder.add_edge(0, 1, 0, 1.0); // node 0 -> node 1, Causal
builder.add_edge(0, 2, 1, 0.5); // node 0 -> node 2, Temporal
builder.add_edge(2, 3, 0, 0.8); // node 2 -> node 3, Causal
let index = builder.build();

assert_eq!(index.node_count(), 4);
assert_eq!(index.edge_count(), 3);
assert_eq!(index.degree(0), 2);
assert_eq!(index.edges_from(0)[0].target, 1);
}

ColumnarBuffer and ColumnReader

ColumnarBuffer

In-memory columnar storage for intent node fields. Each field is stored in a separate vector, enabling efficient serialization where each column is 64-byte aligned for SIMD-friendly access.

#![allow(unused)]
fn main() {
pub struct ColumnarBuffer {
    pub ids: Vec<[u8; 16]>,
    pub timestamps: Vec<u64>,
    pub context_hashes: Vec<u64>,
    pub confidences: Vec<f32>,
    pub outcome_statuses: Vec<u8>,
}
}

Methods

#![allow(unused)]
fn main() {
impl ColumnarBuffer {
    /// Create an empty buffer.
    pub fn new() -> Self;

    /// Create an empty buffer with pre-allocated capacity.
    pub fn with_capacity(cap: usize) -> Self;

    /// Append a row.
    pub fn push(
        &mut self,
        id: &[u8; 16],
        timestamp: u64,
        context_hash: u64,
        confidence: f32,
        status: u8,
    );

    /// Return the number of rows.
    pub fn len(&self) -> usize;

    /// Return true if the buffer is empty.
    pub fn is_empty(&self) -> bool;

    /// Serialize all columns into a flat byte buffer.
    /// Returns (buffer, column_offsets) where offsets index:
    /// [0]=ids, [1]=timestamps, [2]=context_hashes, [3]=confidences, [4]=statuses.
    /// Each column begins at a 64-byte aligned offset.
    pub fn serialize(&self) -> (Vec<u8>, Vec<usize>);
}
}

ColumnReader

Zero-copy typed access into a serialized column buffer. Each read method takes a column offset (from ColumnarBuffer::serialize) and a row index.

#![allow(unused)]
fn main() {
pub struct ColumnReader<'a> { /* private */ }

impl<'a> ColumnReader<'a> {
    /// Wrap a raw byte slice for typed column access.
    pub fn new(data: &'a [u8]) -> Self;

    /// Read a u64 value at the given column offset and row index.
    #[inline]
    pub fn read_u64(&self, col_offset: usize, index: usize) -> u64;

    /// Read an f32 value at the given column offset and row index.
    #[inline]
    pub fn read_f32(&self, col_offset: usize, index: usize) -> f32;

    /// Read a u8 value at the given column offset and row index.
    #[inline]
    pub fn read_u8(&self, col_offset: usize, index: usize) -> u8;

    /// Read a 16-byte ID at the given column offset and row index.
    #[inline]
    pub fn read_id(&self, col_offset: usize, index: usize) -> [u8; 16];
}
}

EmbeddingWriter and EmbeddingReader

EmbeddingWriter

Writes embedding vectors into a contiguous buffer with each vector padded to a 64-byte aligned stride. For dim=384, each vector occupies align_up(384 * 4, 64) = 1536 bytes.

#![allow(unused)]
fn main() {
pub struct EmbeddingWriter { /* private */ }

impl EmbeddingWriter {
    /// Create a writer for embeddings of the given dimensionality.
    pub fn new(dim: usize) -> Self;

    /// Append an embedding vector. The slice length must equal `dim`.
    pub fn push(&mut self, embedding: &[f32]);

    /// Return the serialized byte buffer.
    pub fn as_bytes(&self) -> &[u8];

    /// Return the padded stride (bytes per embedding).
    pub fn stride(&self) -> usize;

    /// Return the number of embeddings written.
    pub fn count(&self) -> usize;
}
}

EmbeddingReader

Zero-copy read access to aligned embedding vectors. Returns &[f32] slices directly from the underlying byte buffer without copying.

#![allow(unused)]
fn main() {
pub struct EmbeddingReader<'a> { /* private */ }

impl<'a> EmbeddingReader<'a> {
    /// Wrap a byte slice for zero-copy embedding access.
    /// `dim` must match the dimensionality used during writing.
    pub fn new(data: &'a [u8], dim: usize) -> Self;

    /// Return the embedding at the given index as a float slice.
    #[inline]
    pub fn get(&self, index: usize) -> &[f32];

    /// Return the number of embeddings in the buffer.
    pub fn count(&self) -> usize;
}
}

SegmentWriter

Orchestrates the serialization of a complete TBF segment. Accepts nodes (with their columnar fields and embeddings) and edges, then produces a single contiguous byte buffer containing: header, node columns, embedding region, CSR edge index, and bloom filter.

#![allow(unused)]
fn main() {
pub struct SegmentWriter { /* private */ }

impl SegmentWriter {
    /// Create a writer for segments with the given embedding dimensionality.
    pub fn new(dim: usize) -> Self;

    /// Add a node with all its fields and embedding.
    pub fn push_node(
        &mut self,
        id: &[u8; 16],
        timestamp: u64,
        context_hash: u64,
        confidence: f32,
        status: u8,
        embedding: &[f32],
    );

    /// Add a directed edge between two nodes (by insertion-order index).
    pub fn add_edge(&mut self, from: usize, to: usize, relation: u8, weight: f32);

    /// Finalize the segment into a contiguous byte buffer with a valid TBF header.
    pub fn finish(self) -> Vec<u8>;
}
}

Example

#![allow(unused)]
fn main() {
use tala_wire::SegmentWriter;

let dim = 8;
let mut writer = SegmentWriter::new(dim);

let id = [0u8; 16];
let embedding = vec![0.1f32; dim];
writer.push_node(&id, 1000, 42, 0.95, 1, &embedding);
writer.add_edge(0, 0, 0, 1.0); // self-edge for demonstration

let segment_bytes = writer.finish();
assert!(segment_bytes.len() > 128); // at least header + data
}

SegmentHeader

The parsed fields from a TBF segment's 128-byte header.

#![allow(unused)]
fn main() {
pub struct SegmentHeader {
    pub version_minor: u16,
    pub version_major: u16,
    pub node_count: u64,
    pub edge_count: u64,
    pub dim: u32,
    pub node_region_offset: u64,
    pub embed_region_offset: u64,
    pub edge_region_offset: u64,
    pub bloom_offset: u64,
}
}

SegmentReader

Parses and validates a TBF segment from a byte buffer. Provides typed access to every region: columnar node fields, embeddings, CSR edges, and bloom filter.

#![allow(unused)]
fn main() {
pub struct SegmentReader<'a> { /* private */ }

impl<'a> SegmentReader<'a> {
    /// Parse a TBF segment from raw bytes. Returns an error string on failure
    /// (invalid magic, truncated header).
    pub fn open(data: &'a [u8]) -> Result<Self, &'static str>;

    /// Access the parsed header.
    pub fn header(&self) -> &SegmentHeader;

    /// Number of intent nodes in the segment.
    pub fn node_count(&self) -> usize;

    /// Number of edges in the segment.
    pub fn edge_count(&self) -> usize;

    /// Embedding dimensionality.
    pub fn dim(&self) -> usize;

    /// Return a ColumnReader over the raw segment bytes.
    pub fn column_reader(&self) -> ColumnReader<'a>;

    /// Read a node ID by row index.
    #[inline]
    pub fn read_id(&self, index: usize) -> [u8; 16];

    /// Read a timestamp by row index.
    #[inline]
    pub fn read_timestamp(&self, index: usize) -> u64;

    /// Read a context hash by row index.
    #[inline]
    pub fn read_context_hash(&self, index: usize) -> u64;

    /// Read a confidence score by row index.
    #[inline]
    pub fn read_confidence(&self, index: usize) -> f32;

    /// Read an outcome status byte by row index.
    #[inline]
    pub fn read_status(&self, index: usize) -> u8;

    /// Zero-copy embedding reader for the embedding region.
    pub fn embedding_reader(&self) -> EmbeddingReader<'a>;

    /// Reconstruct the CSR edge index from the edge region.
    pub fn csr_index(&self) -> CsrIndex;

    /// Reconstruct the bloom filter from the bloom region.
    pub fn bloom_filter(&self) -> BloomFilter;

    /// Return the raw segment bytes.
    pub fn as_bytes(&self) -> &[u8];
}
}

Roundtrip Example

#![allow(unused)]
fn main() {
use tala_wire::{SegmentWriter, SegmentReader};

let dim = 8;
let mut writer = SegmentWriter::new(dim);

let id = [0xABu8; 16];
let embedding = vec![1.0f32; dim];
writer.push_node(&id, 42, 7, 0.9, 1, &embedding);

let bytes = writer.finish();
let reader = SegmentReader::open(&bytes).expect("valid segment");

assert_eq!(reader.node_count(), 1);
assert_eq!(reader.read_id(0), id);
assert_eq!(reader.read_timestamp(0), 42);

let emb = reader.embedding_reader();
assert_eq!(emb.get(0).len(), dim);
}

tala-embed

SIMD-accelerated vector operations, quantization, and the HNSW approximate nearest neighbor index. This crate is the computational engine behind TALA's semantic search: it computes cosine similarity, dot products, and L2 distances using runtime-dispatched AVX2+FMA intrinsics on x86_64, with a scalar fallback that compiles everywhere. It also provides INT8 and FP16 quantization for storage compression and an HNSW index that delivers sub-millisecond top-k search over tens of thousands of vectors.

Key Types

Type	Description
`AlignedVec`	64-byte aligned f32 vector for SIMD operations
`HnswIndex`	Hierarchical Navigable Small World approximate nearest neighbor index

Key Modules

Module	Description
`scalar`	Portable fallback: `dot_product`, `cosine_similarity`, `l2_distance_sq`, `norm_sq`
`avx2`	AVX2+FMA implementations (x86_64 only, `#[cfg(target_arch = "x86_64")]`)
`quantize`	Quantization: `f32_to_int8`, `int8_to_f32`, `f32_to_f16`, `f16_to_f32`

Dispatch Functions

Function	Description
`cosine_similarity(a, b)`	Cosine similarity with runtime ISA dispatch
`dot_product(a, b)`	Dot product with runtime ISA dispatch
`l2_distance_sq(a, b)`	L2 distance squared with runtime ISA dispatch
`batch_cosine(query, corpus, dim, results)`	Single-threaded batch cosine similarity
`batch_cosine_parallel(query, corpus, dim, results)`	Rayon-parallel batch cosine similarity

AlignedVec

A heap-allocated f32 vector guaranteed to begin at a 64-byte aligned address. This alignment is required for optimal AVX-512 loads and beneficial for AVX2. AlignedVec manages its own memory through the global allocator with Layout::from_size_align(len * 4, 64).

#![allow(unused)]
fn main() {
pub struct AlignedVec {
    ptr: *mut f32,
    len: usize,
    cap: usize,
}
}

AlignedVec implements Send, Sync, Clone, Deref<Target = [f32]>, DerefMut, From<Vec<f32>>, and From<&[f32]>. It can be used anywhere a &[f32] is expected.

Methods

#![allow(unused)]
fn main() {
impl AlignedVec {
    /// Allocate a zero-initialized vector of `len` elements at 64-byte alignment.
    pub fn new(len: usize) -> Self;

    /// View as a shared float slice.
    pub fn as_slice(&self) -> &[f32];

    /// View as a mutable float slice.
    pub fn as_mut_slice(&mut self) -> &mut [f32];

    /// Number of elements.
    pub fn len(&self) -> usize;

    /// True if the vector contains no elements.
    pub fn is_empty(&self) -> bool;
}
}

Example

#![allow(unused)]
fn main() {
use tala_embed::AlignedVec;

let mut v = AlignedVec::new(384);
assert_eq!(v.len(), 384);
assert!(v.as_slice().as_ptr() as usize % 64 == 0);

// Populate from a Vec<f32>:
let data: Vec<f32> = (0..384).map(|i| i as f32).collect();
let aligned = AlignedVec::from(data);
assert_eq!(aligned[0], 0.0);
assert_eq!(aligned[383], 383.0);
}

scalar Module

Portable implementations of the core vector operations. These are used on architectures without AVX2 support and serve as the reference implementation for correctness testing.

#![allow(unused)]
fn main() {
pub mod scalar {
    /// Dot product of two equal-length slices.
    #[inline]
    pub fn dot_product(a: &[f32], b: &[f32]) -> f32;

    /// Squared L2 norm of a vector: sum of x_i^2.
    #[inline]
    pub fn norm_sq(a: &[f32]) -> f32;

    /// Cosine similarity: dot(a,b) / (||a|| * ||b|| + epsilon).
    #[inline]
    pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32;

    /// Squared L2 distance: sum of (a_i - b_i)^2.
    #[inline]
    pub fn l2_distance_sq(a: &[f32], b: &[f32]) -> f32;
}
}

avx2 Module

AVX2+FMA implementations available on x86_64. All functions are marked #[target_feature(enable = "avx2,fma")] and must be called through unsafe blocks. The dispatch functions handle this automatically.

#![allow(unused)]
fn main() {
#[cfg(target_arch = "x86_64")]
pub mod avx2 {
    /// Dot product -- 4-way unrolled (32 floats per iteration) to saturate
    /// FMA throughput. Four independent accumulator chains reduce the critical
    /// path 4x vs a single accumulator.
    #[target_feature(enable = "avx2,fma")]
    pub unsafe fn dot_product(a: &[f32], b: &[f32]) -> f32;

    /// Cosine similarity -- 2-way unrolled (6 accumulators: 2 dot + 2 norm_a
    /// + 2 norm_b). Fits within AVX2's 16 YMM register budget (10 live regs).
    #[target_feature(enable = "avx2,fma")]
    pub unsafe fn cosine_similarity(a: &[f32], b: &[f32]) -> f32;

    /// Squared L2 distance -- 4-way unrolled (sub then FMA).
    #[target_feature(enable = "avx2,fma")]
    pub unsafe fn l2_distance_sq(a: &[f32], b: &[f32]) -> f32;
}
}

Dispatch Functions

These top-level functions detect AVX2+FMA at runtime via is_x86_feature_detected! and dispatch to the fastest available implementation. On non-x86_64 architectures, they fall through directly to the scalar path. Call these from application code; never call the avx2 module functions directly.

#![allow(unused)]
fn main() {
/// Cosine similarity with automatic SIMD dispatch.
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32;

/// Dot product with automatic SIMD dispatch.
pub fn dot_product(a: &[f32], b: &[f32]) -> f32;

/// L2 distance squared with automatic SIMD dispatch.
pub fn l2_distance_sq(a: &[f32], b: &[f32]) -> f32;
}

Batch Operations

#![allow(unused)]
fn main() {
/// Compute cosine similarity of `query` against every vector in `corpus`.
/// `corpus` is a flat buffer: vector i = corpus[i*dim .. (i+1)*dim].
/// Results are written into `results[0..n]`.
pub fn batch_cosine(query: &[f32], corpus: &[f32], dim: usize, results: &mut [f32]);

/// Parallel batch cosine using Rayon. Processes corpus in chunks of 256
/// vectors per thread.
pub fn batch_cosine_parallel(query: &[f32], corpus: &[f32], dim: usize, results: &mut [f32]);
}

Example

#![allow(unused)]
fn main() {
use tala_embed::{cosine_similarity, batch_cosine};

let a = vec![1.0, 0.0, 0.0, 0.0];
let b = vec![1.0, 0.0, 0.0, 0.0];
let sim = cosine_similarity(&a, &b);
assert!((sim - 1.0).abs() < 1e-5);

// Batch: query against 3 corpus vectors of dim 4
let query = vec![1.0, 0.0, 0.0, 0.0];
let corpus = vec![
    1.0, 0.0, 0.0, 0.0,  // identical to query
    0.0, 1.0, 0.0, 0.0,  // orthogonal
    0.5, 0.5, 0.0, 0.0,  // partially similar
];
let mut results = vec![0.0f32; 3];
batch_cosine(&query, &corpus, 4, &mut results);
assert!(results[0] > results[2]); // identical > partial
assert!(results[2] > results[1]); // partial > orthogonal
}

quantize Module

Quantization routines for compressing f32 embeddings to INT8 or FP16. Useful for reducing storage footprint (4x for INT8, 2x for FP16) at the cost of some precision.

#![allow(unused)]
fn main() {
pub mod quantize {
    /// Symmetric f32 -> INT8 quantization.
    /// Returns (quantized_bytes, scale_factor). Dequantize with: f32 = i8 * scale.
    pub fn f32_to_int8(src: &[f32]) -> (Vec<i8>, f32);

    /// INT8 -> f32 dequantization.
    pub fn int8_to_f32(src: &[i8], scale: f32) -> Vec<f32>;

    /// f32 -> IEEE 754 half-precision (FP16), stored as u16.
    pub fn f32_to_f16(src: &[f32]) -> Vec<u16>;

    /// FP16 (u16) -> f32 conversion.
    pub fn f16_to_f32(src: &[u16]) -> Vec<f32>;
}
}

Example

#![allow(unused)]
fn main() {
use tala_embed::quantize::{f32_to_int8, int8_to_f32};

let original = vec![0.5, -0.3, 0.9, 0.0];
let (quantized, scale) = f32_to_int8(&original);
let recovered = int8_to_f32(&quantized, scale);

for (o, r) in original.iter().zip(recovered.iter()) {
    assert!((o - r).abs() < 0.01);
}
}

HnswIndex

A Hierarchical Navigable Small World index for approximate nearest neighbor search. Uses L2 distance (via cached norms: ||a||^2 + ||b||^2 - 2*dot(a,b)) internally and returns results sorted by L2 distance. All stored vectors are kept in AlignedVec for SIMD-friendly access.

The implementation uses generation-based visited tracking (a Vec<u32> with a monotonic generation counter) to avoid per-search HashSet allocation. Visited state resets in O(1) by incrementing the generation.

#![allow(unused)]
fn main() {
pub struct HnswIndex { /* private */ }

impl HnswIndex {
    /// Create a new index with the given dimensionality, connectivity `m`,
    /// and construction beam width `ef_construction`. Uses seed 42 for the RNG.
    pub fn new(dim: usize, m: usize, ef_construction: usize) -> Self;

    /// Create an index with an explicit RNG seed for deterministic builds.
    pub fn with_seed(dim: usize, m: usize, ef_construction: usize, seed: u64) -> Self;

    /// Insert a vector into the index. Returns the vector's internal index.
    pub fn insert(&mut self, vector: Vec<f32>) -> usize;

    /// Search for the `k` nearest neighbors of `query`.
    /// `ef` controls the search beam width (higher = more accurate, slower).
    /// Returns (internal_index, L2_distance) pairs sorted by distance ascending.
    pub fn search(&mut self, query: &[f32], k: usize, ef: usize) -> Vec<(usize, f32)>;

    /// Access a stored vector by its internal index.
    #[inline]
    pub fn get_vector(&self, idx: usize) -> &[f32];

    /// Number of vectors in the index.
    pub fn len(&self) -> usize;

    /// True if the index contains no vectors.
    pub fn is_empty(&self) -> bool;
}
}

Parameters

Parameter	Typical Value	Effect
`m`	16	Maximum connections per node per layer. Higher values improve recall at the cost of memory and insert time.
`ef_construction`	200	Beam width during insertion. Higher values build a better graph but slow down construction.
`ef` (search)	50	Beam width during search. Must be >= `k`. Higher values improve recall at the cost of latency.

Example

#![allow(unused)]
fn main() {
use tala_embed::HnswIndex;

let dim = 4;
let mut index = HnswIndex::new(dim, 16, 200);

// Insert 100 vectors
for i in 0..100 {
    let v = vec![i as f32; dim];
    index.insert(v);
}

// Search for the 5 nearest neighbors of a query
let query = vec![50.0; dim];
let results = index.search(&query, 5, 50);

assert_eq!(results.len(), 5);
// Results are sorted by L2 distance ascending
for window in results.windows(2) {
    assert!(window[0].1 <= window[1].1);
}
}

Performance

The HNSW index delivers sub-millisecond search latency at 10K vectors with ef=50. Measured benchmarks:

Operation	Corpus Size	Time
Search (top-10, ef=50)	10K vectors, dim=384	139 us
Semantic query (full pipeline)	10K vectors, dim=384	151 us

tala-graph

In-memory narrative graph engine backed by dual adjacency lists. The graph stores intent nodes with their metadata and directed, typed, weighted edges. It supports forward and backward BFS traversal, automatic edge formation from similarity scores, and narrative extraction for replay planning.

The graph is a directed graph (not necessarily acyclic at this layer; cycle detection is enforced by the replay planner in tala-weave). Each edge carries a RelationType from tala-core and a floating-point weight.

Key Types

Type	Description
`NarrativeGraph`	In-memory graph with dual adjacency lists, BFS/DFS, and edge formation

NarrativeGraph

The core graph type. Internally maintains two HashMap-backed adjacency lists (forward and backward) and a node metadata map. All operations are O(1) amortized for node/edge insertion and adjacency lookup.

#![allow(unused)]
fn main() {
pub struct NarrativeGraph { /* private */ }
}

The internal representation stores:

forward: HashMap<IntentId, Vec<(IntentId, RelationType, f32)>> -- outgoing edges
backward: HashMap<IntentId, Vec<(IntentId, RelationType, f32)>> -- incoming edges
nodes: HashMap<IntentId, NodeData> -- per-node metadata (timestamp, confidence)

Construction

#![allow(unused)]
fn main() {
impl NarrativeGraph {
    /// Create an empty graph.
    pub fn new() -> Self;
}

impl Default for NarrativeGraph {
    fn default() -> Self { Self::new() }
}
}

Node Operations

#![allow(unused)]
fn main() {
impl NarrativeGraph {
    /// Insert a node with its timestamp and confidence score.
    /// Initializes empty adjacency entries in both forward and backward maps.
    pub fn insert_node(&mut self, id: IntentId, timestamp: u64, confidence: f32);

    /// Return the total number of nodes in the graph.
    pub fn node_count(&self) -> usize;

    /// Return true if the graph contains a node with the given ID.
    pub fn contains_node(&self, id: IntentId) -> bool;

    /// Return all node IDs in the graph. Order is not guaranteed.
    pub fn node_ids(&self) -> Vec<IntentId>;
}
}

Edge Operations

#![allow(unused)]
fn main() {
impl NarrativeGraph {
    /// Add a directed edge from `from` to `to` with the given relation type and weight.
    /// Updates both forward and backward adjacency lists.
    pub fn add_edge(
        &mut self,
        from: IntentId,
        to: IntentId,
        relation: RelationType,
        weight: f32,
    );

    /// Return the total number of edges (sum of all forward adjacency list lengths).
    pub fn edge_count(&self) -> usize;
}
}

Edge Formation

#![allow(unused)]
fn main() {
impl NarrativeGraph {
    /// Automatically form edges between `new_node` and existing nodes based
    /// on similarity scores.
    ///
    /// Sorts `similarities` by score descending, then connects the top `k`
    /// existing nodes to `new_node` with `RelationType::Causal` edges.
    /// The similarity score is used as the edge weight.
    pub fn form_edges(
        &mut self,
        new_node: IntentId,
        similarities: &mut [(IntentId, f32)],
        k: usize,
    );
}
}

Traversal

#![allow(unused)]
fn main() {
impl NarrativeGraph {
    /// BFS forward from `start`, visiting up to `max_depth` hops along
    /// outgoing edges. Returns all visited node IDs in BFS order.
    pub fn bfs_forward(&self, start: IntentId, max_depth: usize) -> Vec<IntentId>;

    /// BFS backward from `start`, following incoming edges.
    /// Useful for root-cause analysis.
    pub fn bfs_backward(&self, start: IntentId, max_depth: usize) -> Vec<IntentId>;

    /// Extract a narrative subgraph: bidirectional BFS from `root` up to
    /// `max_depth` hops in both directions.
    /// Returns (visited_nodes, edges) where edges are (from, to) pairs.
    pub fn extract_narrative(
        &self,
        root: IntentId,
        max_depth: usize,
    ) -> (Vec<IntentId>, Vec<(IntentId, IntentId)>);
}
}

Adjacency Access

#![allow(unused)]
fn main() {
impl NarrativeGraph {
    /// Return the forward neighbors (successors) of a node.
    /// Each entry is (neighbor_id, relation_type, weight).
    /// Returns an empty slice if the node has no outgoing edges or does not exist.
    pub fn successors(&self, id: IntentId) -> &[(IntentId, RelationType, f32)];

    /// Return the backward neighbors (predecessors) of a node.
    /// Each entry is (neighbor_id, relation_type, weight).
    /// Returns an empty slice if the node has no incoming edges or does not exist.
    pub fn predecessors(&self, id: IntentId) -> &[(IntentId, RelationType, f32)];
}
}

Example

#![allow(unused)]
fn main() {
use tala_core::{IntentId, RelationType};
use tala_graph::NarrativeGraph;

let mut graph = NarrativeGraph::new();

let a = IntentId::random();
let b = IntentId::random();
let c = IntentId::random();

graph.insert_node(a, 1000, 0.95);
graph.insert_node(b, 2000, 0.90);
graph.insert_node(c, 3000, 0.85);

graph.add_edge(a, b, RelationType::Causal, 0.9);
graph.add_edge(b, c, RelationType::Temporal, 0.7);

assert_eq!(graph.node_count(), 3);
assert_eq!(graph.edge_count(), 2);

// Forward traversal from a: reaches a, b, c
let forward = graph.bfs_forward(a, 10);
assert_eq!(forward.len(), 3);

// Backward traversal from c: reaches c, b, a
let backward = graph.bfs_backward(c, 10);
assert_eq!(backward.len(), 3);

// Successors of a: [(b, Causal, 0.9)]
let succs = graph.successors(a);
assert_eq!(succs.len(), 1);
assert_eq!(succs[0].0, b);

// Predecessors of c: [(b, Temporal, 0.7)]
let preds = graph.predecessors(c);
assert_eq!(preds.len(), 1);
assert_eq!(preds[0].0, b);
}

Edge Formation Example

#![allow(unused)]
fn main() {
use tala_core::IntentId;
use tala_graph::NarrativeGraph;

let mut graph = NarrativeGraph::new();

// Existing nodes
let n1 = IntentId::random();
let n2 = IntentId::random();
let n3 = IntentId::random();
graph.insert_node(n1, 100, 1.0);
graph.insert_node(n2, 200, 1.0);
graph.insert_node(n3, 300, 1.0);

// New node arrives with similarity scores against existing nodes
let new = IntentId::random();
graph.insert_node(new, 400, 1.0);

let mut similarities = vec![
    (n1, 0.3),
    (n2, 0.9),
    (n3, 0.7),
];

// Connect to top-2 by similarity
graph.form_edges(new, &mut similarities, 2);

// n2 and n3 should now have edges to `new`
assert_eq!(graph.edge_count(), 2);
assert_eq!(graph.predecessors(new).len(), 2);
}

Narrative Extraction Example

#![allow(unused)]
fn main() {
use tala_core::{IntentId, RelationType};
use tala_graph::NarrativeGraph;

let mut graph = NarrativeGraph::new();
let a = IntentId::random();
let b = IntentId::random();
let c = IntentId::random();

graph.insert_node(a, 1, 1.0);
graph.insert_node(b, 2, 1.0);
graph.insert_node(c, 3, 1.0);
graph.add_edge(a, b, RelationType::Causal, 1.0);
graph.add_edge(b, c, RelationType::Causal, 1.0);

// Extract narrative from b: discovers a (backward) and c (forward)
let (nodes, edges) = graph.extract_narrative(b, 5);
assert_eq!(nodes.len(), 3);
assert_eq!(edges.len(), 2);
}

tala-store

The storage engine for TALA. Combines a write-ahead log (WAL) for durability, a hot buffer for batching writes into TBF segments, an HNSW index for semantic search, and a HashMap-backed intent store for point lookups. The StorageEngine type implements the IntentStore trait from tala-core, providing the unified storage abstraction used by the daemon and all higher-level subsystems.

All public types use interior mutability (Mutex, RwLock) so the engine is Send + Sync and can be shared across threads without external locking.

Key Types

Type	Description
`Wal`	Write-ahead log for crash recovery
`WalEntry`	A deserialized WAL entry
`HotBuffer`	In-memory intent accumulator with flush-to-segment
`QueryEngine`	HNSW-backed semantic search with cosine re-ranking
`StorageEngine`	Unified `IntentStore` implementation
`StorageMetrics`	Comprehensive per-lock and per-operation timing metrics
`LockStats`	Per-lock statistics: acquisition count, contention, wait/hold times

Wal

A file-backed write-ahead log. Every intent is appended to the WAL before it enters the hot buffer or HNSW index, ensuring durability across crashes. The WAL uses a simple binary format: [4B payload_len][16B id][8B timestamp][4B embed_dim][embed_data][4B cmd_len][cmd_bytes].

#![allow(unused)]
fn main() {
pub struct Wal { /* private */ }

impl Wal {
    /// Create a new WAL file at the given path.
    pub fn create(path: impl AsRef<Path>) -> io::Result<Self>;

    /// Append an intent entry. Does not flush to disk automatically.
    pub fn append(
        &mut self,
        id: &[u8; 16],
        timestamp: u64,
        embedding: &[f32],
        raw_command: &str,
    ) -> io::Result<()>;

    /// Flush the internal buffer to disk.
    pub fn sync(&mut self) -> io::Result<()>;

    /// Return the number of entries written.
    pub fn entry_count(&self) -> u64;

    /// Return the WAL file path.
    pub fn path(&self) -> &Path;
}
}

WalEntry and Replay

#![allow(unused)]
fn main() {
pub struct WalEntry {
    pub id: [u8; 16],
    pub timestamp: u64,
    pub embedding: Vec<f32>,
    pub raw_command: String,
}

/// Replay all entries from a WAL file.
pub fn replay_wal(path: impl AsRef<Path>) -> io::Result<Vec<WalEntry>>;
}

HotBuffer

An in-memory accumulator that collects intents until it reaches capacity, then flushes them into a TBF segment via tala_wire::SegmentWriter. The flush operation is atomic: the buffer is cleared and a complete segment byte buffer is returned.

#![allow(unused)]
fn main() {
pub struct HotBuffer { /* private */ }

impl HotBuffer {
    /// Create a buffer for embeddings of `dim` dimensions with the given capacity.
    pub fn new(dim: usize, capacity: usize) -> Self;

    /// Push an intent. Returns `true` if the buffer has reached capacity
    /// and should be flushed.
    pub fn push(
        &mut self,
        id: [u8; 16],
        timestamp: u64,
        context_hash: u64,
        confidence: f32,
        status: u8,
        embedding: Vec<f32>,
        parent_indices: Vec<usize>,
    ) -> bool;

    /// Number of intents currently buffered.
    pub fn len(&self) -> usize;

    /// True if the buffer is empty.
    pub fn is_empty(&self) -> bool;

    /// Flush all buffered intents into a TBF segment byte buffer.
    /// Clears the buffer.
    pub fn flush(&mut self) -> Vec<u8>;
}
}

QueryEngine

A standalone semantic search engine that pairs an HNSW index with a parallel intent metadata store. Used internally by StorageEngine but also available as a self-contained type.

#![allow(unused)]
fn main() {
pub struct QueryEngine { /* private */ }

impl QueryEngine {
    /// Create an engine for the given embedding dimensionality.
    /// Uses m=16, ef_construction=200 for the HNSW index.
    pub fn new(dim: usize) -> Self;

    /// Insert an intent's metadata and embedding into the search index.
    pub fn insert(
        &mut self,
        id: [u8; 16],
        timestamp: u64,
        raw_command: String,
        embedding: Vec<f32>,
    );

    /// Search for the `k` nearest intents by L2 distance.
    /// Returns (id, L2_distance) pairs.
    pub fn search(&mut self, query: &[f32], k: usize) -> Vec<([u8; 16], f32)>;

    /// Find edge candidates via HNSW approximate search + exact cosine re-rank.
    /// Returns top-k (id, cosine_similarity) pairs sorted descending.
    /// Searches 4*k candidates in HNSW, then re-ranks with exact cosine
    /// similarity for precision. Replaces O(n^2) brute-force edge formation.
    pub fn find_edge_candidates(
        &mut self,
        embedding: &[f32],
        k: usize,
    ) -> Vec<([u8; 16], f32)>;

    /// Number of intents in the index.
    pub fn len(&self) -> usize;

    /// True if the index is empty.
    pub fn is_empty(&self) -> bool;
}
}

StorageEngine

The unified storage engine implementing IntentStore. Coordinates the WAL, hot buffer, HNSW index, and intent map. Every insert call follows this pipeline:

Append to WAL (if persistent mode)
Insert embedding into HNSW index
Push into hot buffer; if full, flush to a TBF segment file
Store the complete intent in the HashMap for point lookups

#![allow(unused)]
fn main() {
pub struct StorageEngine { /* private */ }

impl StorageEngine {
    /// Create a persistent storage engine backed by the given directory.
    /// Creates WAL and segment files in `dir`.
    pub fn open(
        dim: usize,
        dir: impl AsRef<Path>,
        hot_capacity: usize,
    ) -> Result<Self, TalaError>;

    /// Create an in-memory storage engine (no WAL, no segment persistence).
    pub fn in_memory(dim: usize, hot_capacity: usize) -> Self;

    /// Access the storage metrics.
    pub fn metrics(&self) -> &Arc<StorageMetrics>;
}
}

IntentStore Implementation

StorageEngine implements all five methods of the IntentStore trait:

#![allow(unused)]
fn main() {
impl IntentStore for StorageEngine {
    /// Insert an intent. Enforces dimension match. Pipeline: WAL -> HNSW -> hot buffer -> store.
    fn insert(&self, intent: Intent) -> Result<IntentId, TalaError>;

    /// Retrieve an intent by ID. Returns `None` if not found.
    fn get(&self, id: IntentId) -> Result<Option<Intent>, TalaError>;

    /// Semantic search: find the `k` nearest intents by cosine similarity.
    /// Returns (IntentId, cosine_similarity) pairs.
    fn query_semantic(
        &self,
        embedding: &[f32],
        k: usize,
    ) -> Result<Vec<(IntentId, f32)>, TalaError>;

    /// Temporal query: return all intents in the time range, sorted by timestamp.
    fn query_temporal(&self, range: TimeRange) -> Result<Vec<Intent>, TalaError>;

    /// Attach an outcome to an existing intent. Returns NodeNotFound if the ID
    /// does not exist.
    fn attach_outcome(&self, id: IntentId, outcome: Outcome) -> Result<(), TalaError>;
}
}

Example

#![allow(unused)]
fn main() {
use tala_core::{Intent, IntentId, IntentStore, Outcome, Status, TimeRange};
use tala_store::StorageEngine;

let engine = StorageEngine::in_memory(8, 1000);

let intent = Intent {
    id: IntentId::random(),
    timestamp: 42,
    raw_command: "cargo build".to_string(),
    embedding: vec![0.1; 8],
    context_hash: 7,
    parent_ids: vec![],
    outcome: None,
    confidence: 0.95,
};

let id = engine.insert(intent).unwrap();
let retrieved = engine.get(id).unwrap().unwrap();
assert_eq!(retrieved.raw_command, "cargo build");

// Attach an outcome
engine.attach_outcome(id, Outcome {
    status: Status::Success,
    latency_ns: 1000,
    exit_code: 0,
}).unwrap();

// Temporal query
let results = engine.query_temporal(TimeRange { start: 0, end: 100 }).unwrap();
assert_eq!(results.len(), 1);
}

StorageMetrics

Comprehensive metrics for diagnosing storage engine performance. Contains per-lock statistics for all five internal locks and cumulative timing for each pipeline stage. All counters are AtomicU64 with Relaxed ordering.

#![allow(unused)]
fn main() {
pub struct StorageMetrics {
    // Per-lock stats
    pub intents_lock: LockStats,
    pub hnsw_lock: LockStats,
    pub index_map_lock: LockStats,
    pub wal_lock: LockStats,
    pub hot_lock: LockStats,

    // Pipeline sub-operation timing (cumulative nanoseconds)
    pub wal_append_ns: AtomicU64,
    pub wal_append_count: AtomicU64,
    pub hnsw_insert_ns: AtomicU64,
    pub hnsw_insert_count: AtomicU64,
    pub hot_push_ns: AtomicU64,
    pub hot_push_count: AtomicU64,
    pub segment_flush_ns: AtomicU64,
    pub segment_flush_count: AtomicU64,
    pub edge_search_ns: AtomicU64,
    pub edge_search_count: AtomicU64,

    // HNSW internals
    pub hnsw_search_visited: AtomicU64,
    pub hnsw_search_count: AtomicU64,

    // Store state
    pub hot_buffer_len: AtomicU64,
    pub hot_buffer_capacity: AtomicU64,
    pub wal_entry_count: AtomicU64,
    pub total_bytes_flushed: AtomicU64,
    pub segments_flushed_count: AtomicU64,
}
}

LockStats

Per-lock statistics tracked via atomics. Every lock acquisition and release in the storage engine records timing information here.

#![allow(unused)]
fn main() {
pub struct LockStats {
    /// Total number of lock acquisitions.
    pub acquisitions: AtomicU64,
    /// Number of acquisitions where wait exceeded 1 microsecond (contention proxy).
    pub contentions: AtomicU64,
    /// Cumulative time spent waiting to acquire the lock (nanoseconds).
    pub total_wait_ns: AtomicU64,
    /// Cumulative time spent holding the lock (nanoseconds).
    pub total_hold_ns: AtomicU64,
    /// Worst-case wait time observed (nanoseconds).
    pub max_wait_ns: AtomicU64,
    /// Worst-case hold time observed (nanoseconds).
    pub max_hold_ns: AtomicU64,
}

impl LockStats {
    pub fn new() -> Self;

    /// Record the wait time for a lock acquisition.
    /// Increments the contention counter if wait exceeds 1 microsecond.
    pub fn record_acquisition(&self, wait_ns: u64);

    /// Record the hold duration when a lock is released.
    pub fn record_release(&self, hold_ns: u64);
}
}

tala-intent

Converts raw shell command strings into structured Intent objects. The crate implements the IntentExtractor trait from tala-core through the IntentPipeline, which orchestrates four stages: tokenization, embedding generation, classification, and intent assembly.

Key Types

Type	Description
`Token`	A parsed token from a raw shell command
`IntentPipeline`	The full extraction pipeline (implements `IntentExtractor`)

Key Functions

Function	Description
`tokenize(raw)`	Parse a raw command string into structured tokens
`hash_context(ctx)`	FNV-1a hash of a `Context` for the intent's `context_hash` field

Token

An enum representing a single parsed element from a shell command. The tokenizer handles pipes, redirects, flags, quoted strings, and backslash escapes.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Token {
    /// The command name (first word, or first word after a pipe).
    Command(String),
    /// A positional argument.
    Arg(String),
    /// A flag (short `-x` or long `--foo`).
    Flag(String),
    /// Pipe operator `|`.
    Pipe,
    /// Input redirect `<`.
    RedirectIn,
    /// Output redirect `>` or append `>>`.
    RedirectOut { append: bool },
}
}

tokenize

#![allow(unused)]
fn main() {
/// Tokenize a raw shell command into structured tokens.
///
/// Handles:
/// - Pipes (`|`)
/// - Redirects (`<`, `>`, `>>`)
/// - Flags (`-x`, `--flag`)
/// - Quoted strings (single and double)
/// - Backslash escapes within double quotes
///
/// This is a simplified shell splitter, not a full POSIX parser.
pub fn tokenize(raw: &str) -> Vec<Token>;
}

Examples

#![allow(unused)]
fn main() {
use tala_intent::{tokenize, Token};

// Simple command with flags and arguments
let tokens = tokenize("ls -la /tmp");
assert_eq!(tokens, vec![
    Token::Command("ls".into()),
    Token::Flag("-la".into()),
    Token::Arg("/tmp".into()),
]);

// Pipeline
let tokens = tokenize("cat file.txt | grep error");
assert_eq!(tokens, vec![
    Token::Command("cat".into()),
    Token::Arg("file.txt".into()),
    Token::Pipe,
    Token::Command("grep".into()),
    Token::Arg("error".into()),
]);

// Redirects
let tokens = tokenize("sort < input.txt >> output.txt");
assert_eq!(tokens, vec![
    Token::Command("sort".into()),
    Token::RedirectIn,
    Token::Arg("input.txt".into()),
    Token::RedirectOut { append: true },
    Token::Arg("output.txt".into()),
]);

// Quoted strings
let tokens = tokenize("echo \"hello world\"");
assert_eq!(tokens, vec![
    Token::Command("echo".into()),
    Token::Arg("hello world".into()),
]);

// Empty input yields no tokens
assert!(tokenize("").is_empty());
}

IntentPipeline

The main extraction engine. Implements IntentExtractor from tala-core. Construction pre-computes exemplar embeddings for all intent categories, making classification a pure cosine-similarity lookup with no runtime model loading.

The pipeline is Send + Sync and can be shared across threads.

#![allow(unused)]
fn main() {
pub struct IntentPipeline { /* private */ }

impl IntentPipeline {
    /// Create a new pipeline. Pre-computes exemplar embeddings
    /// for intent classification.
    pub fn new() -> Self;

    /// Tokenize a raw command string.
    pub fn tokenize(&self, raw: &str) -> Vec<Token>;

    /// Generate a 384-dimensional embedding for a raw command.
    /// Uses a deterministic hash-based bag-of-characters approach
    /// with L2 normalization to unit length.
    pub fn embed(&self, raw: &str) -> Vec<f32>;

    /// Classify a command given its embedding.
    /// Compares against pre-computed exemplars using cosine similarity
    /// and returns the category with the highest average match.
    pub fn classify(&self, embedding: &[f32]) -> IntentCategory;
}

impl Default for IntentPipeline {
    fn default() -> Self { Self::new() }
}
}

IntentExtractor Implementation

#![allow(unused)]
fn main() {
impl IntentExtractor for IntentPipeline {
    /// Extract a structured Intent from a raw command string and context.
    ///
    /// Pipeline:
    /// 1. Validate and tokenize the raw command
    /// 2. Generate a 384-dim embedding
    /// 3. Classify the intent category
    /// 4. Hash the execution context
    /// 5. Assemble the Intent with a random ID and current timestamp
    ///
    /// Returns `TalaError::ExtractionFailed` if the command is empty
    /// or produces no tokens.
    fn extract(&self, raw: &str, context: &Context) -> Result<Intent, TalaError>;
}
}

Example

#![allow(unused)]
fn main() {
use tala_core::{Context, IntentExtractor};
use tala_intent::IntentPipeline;

let pipeline = IntentPipeline::new();

let ctx = Context {
    cwd: "/home/user/project".to_string(),
    env_hash: 42,
    session_id: 1,
    shell: "zsh".to_string(),
    user: "ops".to_string(),
};

let intent = pipeline.extract("cargo build --release", &ctx).unwrap();

assert_eq!(intent.raw_command, "cargo build --release");
assert_eq!(intent.embedding.len(), 384);
assert!(intent.context_hash != 0);
assert!(intent.outcome.is_none());
assert!((intent.confidence - 1.0).abs() < f32::EPSILON);
}

IntentCategory

Classification of an intent's purpose. The classifier uses pre-computed exemplar embeddings for each category and finds the best match by average cosine similarity.

#![allow(unused)]
fn main() {
// Defined in tala-core, used by tala-intent
pub enum IntentCategory {
    Build,      // cargo build, make, gcc, npm run build
    Deploy,     // kubectl apply, docker push, terraform apply
    Debug,      // gdb, strace, perf record, valgrind
    Configure,  // vim ~/.bashrc, chmod, chown, git config
    Query,      // grep, find, ps aux, df -h, curl
    Navigate,   // cd, ls, pwd, tree
    Other(String),
}
}

The classifier falls back to Other if the best average similarity score is below 0.05.

hash_context

#![allow(unused)]
fn main() {
/// Hash a Context into a u64 for the Intent's context_hash field.
///
/// Uses FNV-1a over all context fields: cwd, env_hash, session_id,
/// shell, and user. The hash is deterministic for the same input.
pub fn hash_context(ctx: &Context) -> u64;
}

Example

#![allow(unused)]
fn main() {
use tala_core::Context;
use tala_intent::hash_context;

let ctx = Context {
    cwd: "/home/user".to_string(),
    env_hash: 0,
    session_id: 1,
    shell: "bash".to_string(),
    user: "ops".to_string(),
};

let h1 = hash_context(&ctx);
let h2 = hash_context(&ctx);
assert_eq!(h1, h2); // deterministic
}

Embedding Details

The embedding generator produces 384-dimensional unit-length vectors using a deterministic hash-based approach. Each byte of the input command contributes to three positions in the vector via multiplicative hashing (FNV-like), with sign determined by hash bits and amplitude decayed by position (earlier characters contribute more). The result is L2-normalized.

This is a placeholder for real ML embeddings but preserves the property that similar command strings produce similar vectors, enabling meaningful cosine similarity comparisons.

Property	Value
Dimensionality	384
Normalization	L2 (unit length)
Determinism	Same input always produces same output
Similarity preservation	Similar strings yield similar vectors

tala-weave

The adaptive replay engine. Takes a narrative graph (or subgraph) and produces a topologically sorted replay plan, applies environment variable substitutions, checks idempotency against completed steps, and orchestrates execution through a caller-supplied executor closure.

Key Types

Type	Description
`ReplayEngine`	Orchestrates the full replay pipeline: plan, transform, idempotency, execute
`ReplayConfig`	Configuration for a replay run: variables, completed set, dry-run flag
`Executor`	Type alias for the executor closure: `Box<dyn FnMut(&str) -> Outcome>`

Key Functions

Function	Description
`build_plan(graph, intent_ids, commands)`	Topological sort via Kahn's algorithm
`substitute_vars(input, vars)`	Replace `${VAR}` patterns in a command string
`filter_completed(steps, completed)`	Remove already-completed steps from a plan

build_plan

Produces a topologically sorted replay plan from a set of intent IDs and the edges between them in the narrative graph. Uses Kahn's algorithm. Returns TalaError::CycleDetected if the subgraph contains a cycle.

#![allow(unused)]
fn main() {
/// Build a topologically sorted replay plan.
///
/// Only considers edges between nodes in `intent_ids`. Each step's `deps`
/// field lists its predecessors within the subgraph.
pub fn build_plan(
    graph: &NarrativeGraph,
    intent_ids: &[IntentId],
    commands: &HashMap<IntentId, String>,
) -> Result<Vec<ReplayStep>, TalaError>;
}

The returned Vec<ReplayStep> is ordered such that no step appears before its dependencies. Each ReplayStep contains the intent ID, the command string, and a list of dependency IDs.

Example

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use tala_core::{IntentId, RelationType};
use tala_graph::NarrativeGraph;
use tala_weave::build_plan;

let mut graph = NarrativeGraph::new();
let a = IntentId::random();
let b = IntentId::random();
let c = IntentId::random();

graph.insert_node(a, 1, 1.0);
graph.insert_node(b, 2, 1.0);
graph.insert_node(c, 3, 1.0);
graph.add_edge(a, b, RelationType::Causal, 1.0);
graph.add_edge(b, c, RelationType::Causal, 1.0);

let mut commands = HashMap::new();
commands.insert(a, "echo A".to_string());
commands.insert(b, "echo B".to_string());
commands.insert(c, "echo C".to_string());

let plan = build_plan(&graph, &[a, b, c], &commands).unwrap();
assert_eq!(plan.len(), 3);
// A appears before B, B before C
let pos: HashMap<_, _> = plan.iter().enumerate()
    .map(|(i, s)| (s.intent_id, i)).collect();
assert!(pos[&a] < pos[&b]);
assert!(pos[&b] < pos[&c]);
}

Cycle Detection

#![allow(unused)]
fn main() {
use tala_core::{IntentId, RelationType, TalaError};
use tala_graph::NarrativeGraph;

let mut graph = NarrativeGraph::new();
let a = IntentId::random();
let b = IntentId::random();
graph.insert_node(a, 1, 1.0);
graph.insert_node(b, 2, 1.0);
graph.add_edge(a, b, RelationType::Causal, 1.0);
graph.add_edge(b, a, RelationType::Causal, 1.0); // cycle

let result = build_plan(&graph, &[a, b], &HashMap::new());
assert!(matches!(result, Err(TalaError::CycleDetected)));
}

substitute_vars

Replaces ${VAR} patterns in a string with values from a lookup map. Unknown variables are left as-is. Handles edge cases: adjacent variables, unclosed braces, empty variable names.

#![allow(unused)]
fn main() {
/// Replace `${VAR}` patterns in `input` with values from `vars`.
/// Unknown variables are left intact.
pub fn substitute_vars(input: &str, vars: &HashMap<String, String>) -> String;
}

Examples

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use tala_weave::substitute_vars;

let mut vars = HashMap::new();
vars.insert("HOME".into(), "/home/user".into());
vars.insert("ENV".into(), "prod".into());

assert_eq!(
    substitute_vars("cd ${HOME} && deploy --env=${ENV}", &vars),
    "cd /home/user && deploy --env=prod"
);

// Unknown variables are left as-is
assert_eq!(
    substitute_vars("echo ${UNKNOWN}", &HashMap::new()),
    "echo ${UNKNOWN}"
);

// Adjacent variables
vars.insert("A".into(), "foo".into());
vars.insert("B".into(), "bar".into());
assert_eq!(substitute_vars("${A}${B}", &vars), "foobar");
}

filter_completed

Removes steps whose intent_id appears in the completed set. Used for idempotent replay: if a step has already been executed successfully, it is skipped.

#![allow(unused)]
fn main() {
/// Filter out steps whose `intent_id` appears in `completed`.
pub fn filter_completed(
    steps: Vec<ReplayStep>,
    completed: &HashSet<IntentId>,
) -> Vec<ReplayStep>;
}

Executor

The type alias for executor closures. An executor receives a transformed command string and returns an Outcome.

#![allow(unused)]
fn main() {
pub type Executor = Box<dyn FnMut(&str) -> Outcome>;
}

ReplayConfig

Configuration for a replay run.

#![allow(unused)]
fn main() {
pub struct ReplayConfig {
    /// Environment variable substitutions to apply to commands.
    pub vars: HashMap<String, String>,
    /// Set of intent IDs already completed (for idempotency).
    pub completed: HashSet<IntentId>,
    /// If true, skip execution and return the plan with Pending outcomes.
    pub dry_run: bool,
}

impl Default for ReplayConfig {
    fn default() -> Self {
        Self {
            vars: HashMap::new(),
            completed: HashSet::new(),
            dry_run: true,
        }
    }
}
}

ReplayEngine

The replay orchestrator. Combines plan building, variable substitution, idempotency checking, and execution into a single engine.

#![allow(unused)]
fn main() {
pub struct ReplayEngine { /* private */ }

impl ReplayEngine {
    /// Create a new replay engine with the given configuration.
    pub fn new(config: ReplayConfig) -> Self;

    /// Set the executor closure for live replay. Consumes and returns self
    /// for builder-style chaining.
    pub fn with_executor(self, executor: Executor) -> Self;

    /// Produce a dry-run plan: ordered, transformed steps with completed
    /// steps removed. Does not execute anything.
    pub fn dry_run(
        &self,
        graph: &NarrativeGraph,
        intent_ids: &[IntentId],
        commands: &HashMap<IntentId, String>,
    ) -> Result<Vec<ReplayStep>, TalaError>;

    /// Execute the replay. Returns a `ReplayResult` for every step in the
    /// plan (including skipped ones).
    ///
    /// Behavior per step:
    /// - If the step's ID is in `completed`: skipped=true, Status::Success, latency=0
    /// - If `dry_run` is set: skipped=false, Status::Pending, latency=0
    /// - Otherwise: applies variable substitution, calls the executor, returns outcome
    pub fn execute(
        &mut self,
        graph: &NarrativeGraph,
        intent_ids: &[IntentId],
        commands: &HashMap<IntentId, String>,
    ) -> Result<Vec<ReplayResult>, TalaError>;
}
}

Example: Dry Run

#![allow(unused)]
fn main() {
use std::collections::{HashMap, HashSet};
use tala_core::{IntentId, RelationType, Status};
use tala_graph::NarrativeGraph;
use tala_weave::{ReplayConfig, ReplayEngine};

let mut graph = NarrativeGraph::new();
let a = IntentId::random();
graph.insert_node(a, 1, 1.0);

let mut commands = HashMap::new();
commands.insert(a, "deploy --env=${ENV}".to_string());

let mut vars = HashMap::new();
vars.insert("ENV".to_string(), "staging".to_string());

let config = ReplayConfig {
    vars,
    completed: HashSet::new(),
    dry_run: true,
};

let engine = ReplayEngine::new(config);
let plan = engine.dry_run(&graph, &[a], &commands).unwrap();

assert_eq!(plan.len(), 1);
assert_eq!(plan[0].command, "deploy --env=staging");
}

Example: Live Execution

#![allow(unused)]
fn main() {
use std::collections::{HashMap, HashSet};
use tala_core::{IntentId, Outcome, RelationType, Status};
use tala_graph::NarrativeGraph;
use tala_weave::{Executor, ReplayConfig, ReplayEngine};

let mut graph = NarrativeGraph::new();
let a = IntentId::random();
let b = IntentId::random();
graph.insert_node(a, 1, 1.0);
graph.insert_node(b, 2, 1.0);
graph.add_edge(a, b, RelationType::Causal, 1.0);

let mut commands = HashMap::new();
commands.insert(a, "echo A".to_string());
commands.insert(b, "echo B".to_string());

let config = ReplayConfig {
    vars: HashMap::new(),
    completed: HashSet::new(),
    dry_run: false,
};

let executor: Executor = Box::new(|_cmd| Outcome {
    status: Status::Success,
    latency_ns: 100,
    exit_code: 0,
});

let mut engine = ReplayEngine::new(config).with_executor(executor);
let results = engine.execute(&graph, &[a, b], &commands).unwrap();

assert_eq!(results.len(), 2);
assert!(results.iter().all(|r| !r.skipped));
assert!(results.iter().all(|r| r.outcome.status == Status::Success));
}

tala-kai

The insight engine. Provides analysis capabilities over intent histories: k-means clustering of intent embeddings, n-gram pattern detection in command sequences, frequency-based next-intent prediction, and narrative summarization. All analysis functions are stateless and operate on slices of data, making them safe for concurrent use. The InsightEngine orchestrator wraps these primitives and returns results as Insight values from tala-core.

Key Types

Type	Description
`ClusterResult`	Result of k-means clustering: assignments, centroids, convergence info
`Pattern`	A recurring command sequence with its frequency count
`NarrativeSummary`	Summary statistics for a set of intents
`InsightEngine`	Orchestrator that combines all analysis capabilities

Key Functions

Function	Description
`kmeans(embeddings, dim, k, max_iter, seed)`	Lloyd's k-means clustering
`detect_patterns(intents, min_count)`	N-gram frequency analysis (bigrams + trigrams)
`predict_next(history, corpus)`	Frequency-based next-command prediction
`summarize(intents)`	Narrative summary with statistics and top commands

kmeans

Runs Lloyd's k-means algorithm on a flat buffer of embedding vectors. Initializes centroids by sampling k random data points (using the provided seed for determinism), then iterates assignment and update steps until convergence or max_iter iterations.

#![allow(unused)]
fn main() {
pub fn kmeans(
    embeddings: &[f32],
    dim: usize,
    k: usize,
    max_iter: usize,
    seed: u64,
) -> Result<ClusterResult, TalaError>;
}

Returns TalaError::DimensionMismatch if:

dim is zero
embeddings.len() is not evenly divisible by dim
k is zero or exceeds the number of data points

ClusterResult

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct ClusterResult {
    /// Cluster assignment for each input point (index into `centroids`).
    pub assignments: Vec<usize>,
    /// Final centroid vectors, shape: k x dim.
    pub centroids: Vec<Vec<f32>>,
    /// Number of iterations until convergence.
    pub iterations: usize,
    /// Whether the algorithm converged before hitting max_iter.
    pub converged: bool,
}
}

Example

#![allow(unused)]
fn main() {
use tala_kai::kmeans;

// Two well-separated clusters in 2D
let embeddings = vec![
    0.0, 0.0,   // cluster A
    0.1, 0.1,   // cluster A
    10.0, 10.0, // cluster B
    9.9, 9.9,   // cluster B
];

let result = kmeans(&embeddings, 2, 2, 50, 42).unwrap();
assert_eq!(result.assignments.len(), 4);
assert_eq!(result.assignments[0], result.assignments[1]); // same cluster
assert_eq!(result.assignments[2], result.assignments[3]); // same cluster
assert_ne!(result.assignments[0], result.assignments[2]); // different clusters
assert!(result.converged);
}

detect_patterns

Detects recurring command sequences (bigrams and trigrams) from a time-sorted list of intents. Returns patterns whose frequency meets or exceeds min_count, sorted by count descending, then by sequence length descending.

#![allow(unused)]
fn main() {
pub fn detect_patterns(intents: &[Intent], min_count: usize) -> Vec<Pattern>;
}

Intents MUST be pre-sorted by timestamp for meaningful n-gram extraction.

Pattern

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct Pattern {
    /// The command sequence (bigram or trigram).
    pub commands: Vec<String>,
    /// Number of occurrences in the corpus.
    pub count: usize,
}
}

Example

#![allow(unused)]
fn main() {
use tala_kai::detect_patterns;

// Given intents with commands: [git status, git add, git commit] x 3
// detect_patterns finds the trigram with count=3 and bigrams with count>=3
let patterns = detect_patterns(&intents, 2);
assert!(!patterns.is_empty());

// Trigram "git status -> git add -> git commit" should have count 3
let trigram = patterns.iter()
    .find(|p| p.commands.len() == 3 && p.commands[0] == "git status")
    .unwrap();
assert_eq!(trigram.count, 3);
}

predict_next

Predicts the most likely next command given recent history and a corpus of past intents. Uses a frequency model that checks trigram, bigram, and unigram contexts in order of preference. Longer context matches are preferred because they carry more signal.

#![allow(unused)]
fn main() {
/// Returns `None` if no match is found, history is empty,
/// or the corpus has fewer than 2 intents.
pub fn predict_next(history: &[String], corpus: &[Intent]) -> Option<String>;
}

Example

#![allow(unused)]
fn main() {
use tala_kai::predict_next;

// Corpus: [cd src, cargo build, cargo test] repeated
let history = vec!["cd src".to_string(), "cargo build".to_string()];
let prediction = predict_next(&history, &corpus);
assert_eq!(prediction, Some("cargo test".to_string()));
}

summarize

Produces summary statistics for a set of intents: counts of successes, failures, and pending outcomes, time span, most frequent commands, and a human-readable text summary.

#![allow(unused)]
fn main() {
pub fn summarize(intents: &[Intent]) -> NarrativeSummary;
}

NarrativeSummary

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct NarrativeSummary {
    /// Total number of intents.
    pub total: usize,
    /// Number with Success outcome.
    pub successes: usize,
    /// Number with Failure outcome.
    pub failures: usize,
    /// Number with no outcome attached (or Pending status).
    pub pending: usize,
    /// Earliest timestamp (nanosecond epoch).
    pub time_start: u64,
    /// Latest timestamp (nanosecond epoch).
    pub time_end: u64,
    /// Most common commands, sorted by frequency descending (up to 10).
    pub top_commands: Vec<(String, usize)>,
    /// Human-readable summary text.
    pub text: String,
}
}

Example

#![allow(unused)]
fn main() {
use tala_kai::summarize;

let summary = summarize(&intents);
assert_eq!(summary.total, intents.len());
assert!(summary.text.contains("intents"));
assert!(summary.text.contains("Success rate"));
}

InsightEngine

The orchestrator that wraps all analysis capabilities and returns results as Insight values from tala-core. Configurable thresholds allow tuning pattern detection and clustering behavior.

#![allow(unused)]
fn main() {
pub struct InsightEngine {
    /// Minimum n-gram count threshold for pattern detection.
    pub pattern_threshold: usize,
    /// K-means seed for deterministic clustering.
    pub seed: u64,
    /// Maximum k-means iterations.
    pub max_kmeans_iter: usize,
}

impl Default for InsightEngine {
    fn default() -> Self {
        Self {
            pattern_threshold: 2,
            seed: 42,
            max_kmeans_iter: 100,
        }
    }
}
}

Methods

#![allow(unused)]
fn main() {
impl InsightEngine {
    /// Create a new engine with default settings.
    pub fn new() -> Self;

    /// Cluster intent embeddings into `k` groups.
    /// Delegates to `kmeans()` with the engine's seed and max iterations.
    pub fn analyze_clusters(
        &self,
        embeddings: &[f32],
        dim: usize,
        k: usize,
    ) -> Result<ClusterResult, TalaError>;

    /// Detect recurring patterns in a sorted list of intents.
    /// Returns each pattern as an `Insight` with kind `RecurringPattern`.
    /// Confidence is the pattern frequency divided by corpus length.
    pub fn detect_patterns(&self, intents: &[Intent]) -> Vec<Insight>;

    /// Predict the next command given recent history and a corpus.
    /// Returns an `Insight` with kind `Prediction` if a match is found.
    pub fn predict_next(
        &self,
        history: &[String],
        corpus: &[Intent],
    ) -> Option<Insight>;

    /// Summarize a set of intents into an `Insight` with kind `Summary`.
    /// Always returns an insight with confidence 1.0.
    pub fn summarize(&self, intents: &[Intent]) -> Insight;
}
}

Example

#![allow(unused)]
fn main() {
use tala_core::InsightKind;
use tala_kai::InsightEngine;

let engine = InsightEngine::new();

// Pattern detection
let insights = engine.detect_patterns(&intents);
assert!(insights.iter().all(|i| i.kind == InsightKind::RecurringPattern));

// Prediction
if let Some(prediction) = engine.predict_next(&history, &corpus) {
    assert_eq!(prediction.kind, InsightKind::Prediction);
}

// Summary
let summary = engine.summarize(&intents);
assert_eq!(summary.kind, InsightKind::Summary);
assert_eq!(summary.confidence, 1.0);
}

tala-daemon

The top-level orchestrator tying all TALA subsystems together. Provides the Daemon facade with four primary operations -- ingest, query, replay, and insights -- backed by an IngestPipeline that coordinates intent extraction, storage, and graph edge formation. A DaemonBuilder configures and constructs Daemon instances for both in-memory and persistent modes.

This is a library crate. The binary wrapper, Unix socket listener, and TCP server are future work.

Key Types

Type	Description
`Daemon`	Top-level facade: ingest, query, replay, insights
`DaemonBuilder`	Builder pattern for configuring and constructing a `Daemon`
`DaemonMetrics`	Per-phase timing metrics for the daemon pipeline

Daemon

The unified interface to all TALA subsystems. Internally owns an IngestPipeline (which in turn owns the IntentPipeline, StorageEngine, and NarrativeGraph) and provides four public operations.

#![allow(unused)]
fn main() {
pub struct Daemon { /* private */ }
}

ingest

#![allow(unused)]
fn main() {
impl Daemon {
    /// Ingest a raw command string with execution context.
    ///
    /// Pipeline:
    /// 1. Extract intent via IntentPipeline (tokenize, embed, classify)
    /// 2. Insert into StorageEngine (WAL, HNSW index, hot buffer)
    /// 3. Semantic search for edge candidates (top-5 nearest neighbors)
    /// 4. Form edges in the NarrativeGraph
    ///
    /// Returns the IntentId assigned to the new intent.
    pub fn ingest(&self, raw: &str, context: &Context) -> Result<IntentId, TalaError>;
}
}

query

#![allow(unused)]
fn main() {
impl Daemon {
    /// Semantic search: find the `k` intents most similar to `embedding`.
    ///
    /// Delegates to StorageEngine::query_semantic. Returns
    /// (IntentId, cosine_similarity) pairs sorted by similarity descending.
    pub fn query(
        &self,
        embedding: &[f32],
        k: usize,
    ) -> Result<Vec<(IntentId, f32)>, TalaError>;
}
}

replay

#![allow(unused)]
fn main() {
impl Daemon {
    /// Build a replay plan rooted at `root`, traversing up to `depth` hops
    /// forward in the narrative graph.
    ///
    /// Steps:
    /// 1. BFS forward from root to discover the reachable subgraph
    /// 2. Fetch commands from the store for each reachable node
    /// 3. Topologically sort via tala_weave::build_plan
    ///
    /// Returns TalaError::NodeNotFound if root is not in the graph.
    pub fn replay(
        &self,
        root: IntentId,
        depth: usize,
    ) -> Result<Vec<ReplayStep>, TalaError>;
}
}

insights

#![allow(unused)]
fn main() {
impl Daemon {
    /// Generate insights from the current intent corpus.
    ///
    /// Analysis:
    /// 1. Collect all intents from the store, sorted by timestamp
    /// 2. Run pattern detection (n-gram frequency via InsightEngine)
    /// 3. Produce a narrative summary
    /// 4. Run k-means clustering with `k_clusters` clusters
    ///
    /// Returns a mix of RecurringPattern, Summary, and FailureCluster insights.
    pub fn insights(
        &self,
        k_clusters: usize,
    ) -> Result<Vec<Insight>, TalaError>;
}
}

Accessors

#![allow(unused)]
fn main() {
impl Daemon {
    /// Access the underlying storage engine for direct queries.
    pub fn store(&self) -> &StorageEngine;

    /// Access the daemon-level metrics.
    pub fn daemon_metrics(&self) -> &Arc<DaemonMetrics>;
}
}

DaemonBuilder

Builder for configuring and constructing a Daemon. Supports both in-memory and persistent modes.

#![allow(unused)]
fn main() {
pub struct DaemonBuilder { /* private */ }

impl DaemonBuilder {
    /// Create a new builder with default settings:
    /// - dim: 384
    /// - hot_capacity: 10,000
    pub fn new() -> Self;

    /// Set the embedding dimension.
    pub fn dim(self, dim: usize) -> Self;

    /// Set the hot buffer capacity (number of intents before segment flush).
    pub fn hot_capacity(self, hot_capacity: usize) -> Self;

    /// Build an in-memory daemon (no WAL, no persistence).
    pub fn build_in_memory(self) -> Daemon;

    /// Build a persistent daemon backed by the given directory.
    /// Creates WAL and segment files in `dir`.
    pub fn build(self, dir: impl AsRef<Path>) -> Result<Daemon, TalaError>;
}

impl Default for DaemonBuilder {
    fn default() -> Self { Self::new() }
}
}

Example: In-Memory Daemon

#![allow(unused)]
fn main() {
use tala_core::Context;
use tala_daemon::DaemonBuilder;

let daemon = DaemonBuilder::new()
    .dim(384)
    .hot_capacity(1000)
    .build_in_memory();

let ctx = Context {
    cwd: "/home/user/project".to_string(),
    env_hash: 42,
    session_id: 1,
    shell: "zsh".to_string(),
    user: "ops".to_string(),
};

// Ingest
let id = daemon.ingest("cargo build --release", &ctx).unwrap();
assert!(daemon.store().get(id).unwrap().is_some());

// Query
let pipeline = tala_intent::IntentPipeline::new();
let query_emb = pipeline.embed("cargo build");
let results = daemon.query(&query_emb, 5).unwrap();
assert!(!results.is_empty());

// Replay
let plan = daemon.replay(id, 3).unwrap();
assert_eq!(plan[0].intent_id, id);

// Insights
let insights = daemon.insights(2).unwrap();
assert!(!insights.is_empty());
}

Example: Persistent Daemon

#![allow(unused)]
fn main() {
use tala_daemon::DaemonBuilder;

let daemon = DaemonBuilder::new()
    .dim(384)
    .hot_capacity(100)
    .build("/tmp/tala-data")
    .unwrap();
}

DaemonMetrics

Per-phase timing metrics for the daemon pipeline. All fields are AtomicU64 with cumulative nanosecond timing and per-operation counts.

#![allow(unused)]
fn main() {
pub struct DaemonMetrics {
    /// Intent extraction time (cumulative nanoseconds).
    pub extract_ns: AtomicU64,
    pub extract_count: AtomicU64,

    /// Storage engine insert time (cumulative nanoseconds).
    pub store_insert_ns: AtomicU64,
    pub store_insert_count: AtomicU64,

    /// Edge formation time: semantic search + graph mutation (cumulative nanoseconds).
    pub edge_formation_ns: AtomicU64,
    pub edge_formation_count: AtomicU64,

    /// Semantic query time (cumulative nanoseconds).
    pub query_ns: AtomicU64,
    pub query_count: AtomicU64,

    /// Replay plan building time (cumulative nanoseconds).
    pub replay_ns: AtomicU64,
    pub replay_count: AtomicU64,

    /// Insight generation time (cumulative nanoseconds).
    pub insight_ns: AtomicU64,
    pub insight_count: AtomicU64,
}

impl DaemonMetrics {
    pub fn new() -> Self;
}

impl Default for DaemonMetrics {
    fn default() -> Self { Self::new() }
}
}

The daemon records timing for each phase of every operation. Access metrics via daemon.daemon_metrics(). The underlying storage engine also provides its own detailed metrics via daemon.store().metrics().

tala-net

Distributed networking layer for TALA. Provides core distributed types (node identity, partitioning, membership), a TLV message codec, and an in-process channel-based transport for testing without a real network. Real QUIC transport will be added in a future phase (see spec-04).

Key Types

Type	Description
`NodeId`	Unique identifier for a node in the cluster
`PeerId`	Type alias for `NodeId` in transport contexts
`PartitionId`	Identifies a partition of the intent graph
`Message`	Framed messages exchanged between nodes
`PartitionAssignment`	Maps a partition to its owner and replicas
`PartitionTable`	Cluster-wide partition routing table
`MemberState`	Liveness state of a cluster member (SWIM model)
`MembershipList`	Versioned cluster membership list
`InProcessNetwork`	Simulated network of channel-connected transports
`InProcessTransport`	A single node's view of the in-process network

Key Functions

Function	Description
`encode(msg)`	Serialize a `Message` into a TLV byte buffer
`decode(data)`	Deserialize a `Message` from a TLV byte buffer

NodeId and PartitionId

#![allow(unused)]
fn main() {
/// Unique identifier for a node in the cluster.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub struct NodeId(pub u64);

/// Alias for NodeId used in transport contexts.
pub type PeerId = NodeId;

/// Identifies a partition of the intent graph.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub struct PartitionId(pub u32);
}

Message

The set of messages exchanged between nodes. All variants are serializable via the TLV codec.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq)]
pub enum Message {
    /// Health check request.
    Ping { from: NodeId, seq: u64 },
    /// Health check response.
    Pong { from: NodeId, seq: u64 },
    /// Forward an intent to the partition owner.
    IntentForward { partition: PartitionId, payload: Vec<u8> },
    /// Replicate a segment to a replica node.
    SegmentSync { partition: PartitionId, segment_data: Vec<u8> },
    /// Broadcast membership changes.
    MembershipUpdate { members: Vec<NodeId>, version: u64 },
    /// Broadcast partition table changes.
    PartitionTableUpdate { assignments: Vec<PartitionAssignment>, version: u64 },
}
}

PartitionTable

Cluster-wide routing table mapping partitions to their owner and replica nodes.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct PartitionTable {
    pub version: u64,
    pub assignments: Vec<PartitionAssignment>,
}

#[derive(Clone, Debug, PartialEq)]
pub struct PartitionAssignment {
    pub partition_id: PartitionId,
    pub owner: NodeId,
    pub replicas: Vec<NodeId>,
}
}

Methods

#![allow(unused)]
fn main() {
impl PartitionTable {
    /// Return the owner of a given partition, if assigned.
    pub fn owner_of(&self, partition: PartitionId) -> Option<NodeId>;

    /// Return all partitions owned by or replicated on a given node.
    pub fn partitions_for(&self, node: NodeId) -> Vec<PartitionId>;

    /// Consistent-hash an intent ID (16 raw UUID bytes) to a partition.
    /// Uses FNV-1a over the id bytes, then reduces modulo `num_partitions`.
    /// Returns `PartitionId(0)` if `num_partitions` is zero.
    pub fn partition_for_intent(id_bytes: &[u8; 16], num_partitions: u32) -> PartitionId;
}
}

Example

#![allow(unused)]
fn main() {
use tala_net::{NodeId, PartitionId, PartitionTable, PartitionAssignment};

let table = PartitionTable {
    version: 1,
    assignments: vec![
        PartitionAssignment {
            partition_id: PartitionId(0),
            owner: NodeId(10),
            replicas: vec![NodeId(11), NodeId(12)],
        },
        PartitionAssignment {
            partition_id: PartitionId(1),
            owner: NodeId(11),
            replicas: vec![NodeId(10)],
        },
    ],
};

assert_eq!(table.owner_of(PartitionId(0)), Some(NodeId(10)));
assert_eq!(table.owner_of(PartitionId(1)), Some(NodeId(11)));
assert_eq!(table.owner_of(PartitionId(99)), None);

// Node 10 owns partition 0 and is a replica on partition 1
let p10 = table.partitions_for(NodeId(10));
assert_eq!(p10.len(), 2);

// Consistent hashing
let id = [1u8; 16];
let p = PartitionTable::partition_for_intent(&id, 64);
assert!(p.0 < 64);
}

MembershipList

Versioned cluster membership list following the SWIM protocol model. Each member has a liveness state: Alive, Suspect, or Dead. Version is bumped on every state transition.

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum MemberState {
    Alive,
    Suspect,
    Dead,
}

#[derive(Clone, Debug)]
pub struct MembershipList {
    pub members: Vec<(NodeId, MemberState)>,
    pub version: u64,
}
}

Methods

#![allow(unused)]
fn main() {
impl MembershipList {
    /// Create an empty membership list (version 0).
    pub fn new() -> Self;

    /// Add a member as Alive. If the node already exists, its state is
    /// set back to Alive. Bumps the version.
    pub fn add_member(&mut self, node: NodeId);

    /// Transition a member to Suspect. Bumps the version.
    /// No-op if the node is not present.
    pub fn mark_suspect(&mut self, node: NodeId);

    /// Transition a member to Dead. Bumps the version.
    /// No-op if the node is not present.
    pub fn mark_dead(&mut self, node: NodeId);

    /// Return all members whose state is Alive.
    pub fn alive_members(&self) -> Vec<NodeId>;
}
}

Example

#![allow(unused)]
fn main() {
use tala_net::{MembershipList, MemberState, NodeId};

let mut ml = MembershipList::new();
ml.add_member(NodeId(1));
ml.add_member(NodeId(2));
ml.add_member(NodeId(3));
assert_eq!(ml.alive_members().len(), 3);
assert_eq!(ml.version, 3);

ml.mark_suspect(NodeId(2));
assert_eq!(ml.alive_members().len(), 2);

ml.mark_dead(NodeId(2));
assert_eq!(ml.alive_members().len(), 2);

// Re-add a dead node -> back to Alive
ml.add_member(NodeId(2));
assert_eq!(ml.alive_members().len(), 3);
}

TLV Codec

Serializes and deserializes Message values using a Type-Length-Value wire format.

Wire layout: [tag:1 byte][length:4 bytes LE][payload:length bytes]

Tag	Message
`0x01`	Ping
`0x02`	Pong
`0x03`	IntentForward
`0x04`	SegmentSync
`0x05`	MembershipUpdate
`0x06`	PartitionTableUpdate

#![allow(unused)]
fn main() {
/// Encode a Message into a TLV byte buffer.
pub fn encode(msg: &Message) -> Vec<u8>;

/// Decode a Message from a TLV byte buffer.
/// Returns TalaError::SegmentCorrupted on invalid or truncated input.
pub fn decode(data: &[u8]) -> Result<Message, TalaError>;
}

Example

#![allow(unused)]
fn main() {
use tala_net::{encode, decode, Message, NodeId};

let msg = Message::Ping { from: NodeId(42), seq: 1 };
let bytes = encode(&msg);
let decoded = decode(&bytes).unwrap();
assert_eq!(decoded, msg);
}

InProcessNetwork and InProcessTransport

A simulated network for testing distributed protocols without real sockets. Nodes are connected via mpsc channels. Messages are delivered in FIFO order per sender-receiver pair.

InProcessNetwork

#![allow(unused)]
fn main() {
pub struct InProcessNetwork { /* private */ }

impl InProcessNetwork {
    /// Create a new empty in-process network.
    pub fn new() -> Self;

    /// Register a node and return its transport handle.
    pub fn add_node(&self, id: NodeId) -> InProcessTransport;
}
}

InProcessTransport

#![allow(unused)]
fn main() {
pub struct InProcessTransport { /* private */ }

impl InProcessTransport {
    /// Send a message to a specific peer. Silently drops if the peer is not
    /// registered (mirrors UDP-like fire-and-forget semantics).
    pub fn send(&self, to: NodeId, msg: Message);

    /// Try to receive the next message (non-blocking).
    /// Returns (sender_id, message) or None if no message is available.
    pub fn recv(&self) -> Option<(NodeId, Message)>;

    /// Broadcast a message to all registered peers except self.
    pub fn broadcast(&self, msg: Message);
}
}

Example

#![allow(unused)]
fn main() {
use tala_net::{InProcessNetwork, NodeId, Message};

let net = InProcessNetwork::new();
let t1 = net.add_node(NodeId(1));
let t2 = net.add_node(NodeId(2));
let t3 = net.add_node(NodeId(3));

// Point-to-point
t1.send(NodeId(2), Message::Ping { from: NodeId(1), seq: 1 });
let (from, msg) = t2.recv().unwrap();
assert_eq!(from, NodeId(1));

// Broadcast
t1.broadcast(Message::Pong { from: NodeId(1), seq: 42 });
assert!(t2.recv().is_some()); // t2 receives
assert!(t3.recv().is_some()); // t3 receives
assert!(t1.recv().is_none()); // t1 does NOT receive its own broadcast
}

tala-cli

The user-facing command-line interface for TALA. Provides command parsing, execution against a Daemon, and human-readable output formatting. This is a library crate; the binary wrapper (main.rs) is future work. The CLI supports five subcommands: ingest, find, replay, status, and insights.

Key Types

Type	Description
`Command`	Parsed CLI subcommand enum
`CommandParser`	Hand-written argument parser
`CommandRunner`	Executes parsed commands against a `Daemon`
`Output`	Structured output from command execution
`SearchResult`	A single semantic search result
`ReplayStepOutput`	A single replay plan step (display format)
`StatusOutput`	Daemon status information
`InsightOutput`	A single insight (display format)

Command

The parsed subcommand enum. Each variant carries the arguments extracted from the command line.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq)]
pub enum Command {
    /// Ingest a raw shell command into the narrative graph.
    Ingest { raw_command: String },

    /// Semantic search for intents matching a query string.
    Find { query: String, k: usize },

    /// Build a replay plan from a root intent.
    Replay {
        root_id: String,
        depth: usize,
        dry_run: bool,
    },

    /// Show daemon status.
    Status,

    /// Run insight analysis (clustering + pattern detection).
    Insights { clusters: usize },
}
}

CommandParser

A hand-written argument parser. Expects args[0] to be the binary name and args[1] to be the subcommand.

#![allow(unused)]
fn main() {
pub struct CommandParser;

impl CommandParser {
    /// Parse a slice of command-line arguments into a Command.
    ///
    /// Returns an error string for:
    /// - Missing subcommand
    /// - Unknown subcommand
    /// - Missing required arguments
    /// - Invalid flag values
    /// - Unknown flags
    pub fn parse(args: &[String]) -> Result<Command, String>;
}
}

Subcommand Syntax

Subcommand	Usage	Defaults
`ingest`	`tala ingest <command>`	--
`find`	`tala find <query> [--k N]`	k=10
`replay`	`tala replay <uuid> [--depth N] [--dry-run]`	depth=3, dry_run=false
`status`	`tala status`	--
`insights`	`tala insights [--clusters N]`	clusters=5

Examples

#![allow(unused)]
fn main() {
use tala_cli::{CommandParser, Command};

fn args(strs: &[&str]) -> Vec<String> {
    strs.iter().map(|s| s.to_string()).collect()
}

// Ingest
let cmd = CommandParser::parse(&args(&["tala", "ingest", "kubectl apply -f deploy.yaml"])).unwrap();
assert_eq!(cmd, Command::Ingest { raw_command: "kubectl apply -f deploy.yaml".into() });

// Find with custom k
let cmd = CommandParser::parse(&args(&["tala", "find", "deploy", "--k", "20"])).unwrap();
assert_eq!(cmd, Command::Find { query: "deploy".into(), k: 20 });

// Replay with flags
let cmd = CommandParser::parse(&args(&[
    "tala", "replay", "550e8400-e29b-41d4-a716-446655440000",
    "--depth", "5", "--dry-run"
])).unwrap();
assert_eq!(cmd, Command::Replay {
    root_id: "550e8400-e29b-41d4-a716-446655440000".into(),
    depth: 5,
    dry_run: true,
});

// Status
let cmd = CommandParser::parse(&args(&["tala", "status"])).unwrap();
assert_eq!(cmd, Command::Status);

// Insights with custom cluster count
let cmd = CommandParser::parse(&args(&["tala", "insights", "--clusters", "8"])).unwrap();
assert_eq!(cmd, Command::Insights { clusters: 8 });
}

Error Cases

#![allow(unused)]
fn main() {
// Missing subcommand
assert!(CommandParser::parse(&args(&["tala"])).is_err());

// Unknown subcommand
let err = CommandParser::parse(&args(&["tala", "frobnicate"])).unwrap_err();
assert!(err.contains("unknown subcommand"));

// Missing required argument
assert!(CommandParser::parse(&args(&["tala", "ingest"])).is_err());

// Invalid flag value
let err = CommandParser::parse(&args(&["tala", "find", "query", "--k", "abc"])).unwrap_err();
assert!(err.contains("invalid value for --k"));

// Unknown flag
let err = CommandParser::parse(&args(&["tala", "find", "query", "--verbose"])).unwrap_err();
assert!(err.contains("unknown flag"));
}

CommandRunner

Executes parsed Command values against a Daemon instance, producing structured Output.

#![allow(unused)]
fn main() {
pub struct CommandRunner;

impl CommandRunner {
    /// Run a parsed command against the given daemon.
    ///
    /// - Ingest: extracts intent with default Context, returns Ingested
    /// - Find: embeds query via IntentPipeline, searches store, returns SearchResults
    /// - Replay: parses UUID, calls daemon.replay(), returns ReplayPlan
    /// - Status: returns StatusOutput (currently placeholder values)
    /// - Insights: calls daemon.insights(), returns Insights
    pub fn run(daemon: &Daemon, cmd: Command) -> Result<Output, TalaError>;
}
}

Example

#![allow(unused)]
fn main() {
use tala_cli::{Command, CommandRunner, Output};
use tala_daemon::DaemonBuilder;

let daemon = DaemonBuilder::new().dim(384).build_in_memory();

let output = CommandRunner::run(
    &daemon,
    Command::Ingest { raw_command: "ls -la".into() },
).unwrap();

match output {
    Output::Ingested { id } => {
        assert!(!id.is_empty());
        println!("Ingested intent: {id}");
    }
    _ => panic!("expected Ingested"),
}
}

Output

Structured output from command execution. Implements Display for human-readable formatting.

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub enum Output {
    /// Result of an ingest command.
    Ingested { id: String },
    /// Results of a semantic search.
    SearchResults(Vec<SearchResult>),
    /// Replay plan steps.
    ReplayPlan(Vec<ReplayStepOutput>),
    /// Daemon status.
    Status(StatusOutput),
    /// Insight analysis results.
    Insights(Vec<InsightOutput>),
}
}

Display Formatting

Each variant formats itself for terminal output:

// Ingested
Ingested: 550e8400-e29b-41d4-a716-446655440000

// SearchResults
Search results (3 found):
  [0] 550e8400-...  (similarity: 0.9500)
  [1] 661f9511-...  (similarity: 0.8200)
  [2] 772a0622-...  (similarity: 0.7100)

// ReplayPlan
Replay plan (2 steps):
  [0] echo hello  (id: 550e8400-..., deps: 0)
  [1] echo world  (id: 661f9511-..., deps: 1)

// Status
TALA daemon status:
  Nodes:    42
  Edges:    100
  Commands: 42
  Dim:      384

// Insights
Insights (2 found):
  [0] [pattern] Recurring sequence  (confidence: 0.85)
  [1] [summary] 42 intents over 3.0s...  (confidence: 1.00)

Supporting Output Types

SearchResult

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct SearchResult {
    pub id: String,
    pub similarity: f32,
}
}

ReplayStepOutput

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct ReplayStepOutput {
    pub id: String,
    pub command: String,
    pub dep_count: usize,
}
}

StatusOutput

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct StatusOutput {
    pub node_count: usize,
    pub edge_count: usize,
    pub command_count: usize,
    pub dim: usize,
}
}

InsightOutput

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub struct InsightOutput {
    pub kind: String,
    pub description: String,
    pub confidence: f32,
}
}

The kind field maps from InsightKind as follows:

InsightKind	String
`RecurringPattern`	`"pattern"`
`FailureCluster`	`"failure"`
`Prediction`	`"prediction"`
`Summary`	`"summary"`

Deployment

TALA ships a Docker Compose stack that runs four simulation verticals, a Prometheus metrics backend, Grafana dashboards, and the Observatory front-end. Everything starts from a single docker compose up.

Prerequisites

Docker Engine 24+ with Compose v2
4 GB RAM minimum (each simulator uses ~200 MB, Prometheus and Grafana add ~500 MB)
Ports 3000, 8080, 9090, 9101-9104 available on the host

Quick Start

cd deploy
cp .env.example .env   # adjust values as needed
docker compose up -d

The stack builds the tala-sim binary from source using the workspace Dockerfile, then launches all services. First build takes a few minutes; subsequent starts use cached layers.

Services

Simulation Verticals

Four instances of tala-sim run independently, each modeling a different operational domain. Each exposes Prometheus metrics on its container's port 9090.

Service	Container	Domain	Host Port	Default Rate
sim-incident	`sim-incident`	Incident Response	9101	400ms
sim-deploy	`sim-deploy`	Continuous Deployment	9102	300ms
sim-observe	`sim-observe`	Observability	9103	200ms
sim-provision	`sim-provision`	Provisioning	9104	350ms

Each simulator runs five concurrent threads: ingest (generates intents at the configured rate), query (semantic search every TALA_QUERY_INTERVAL_S), insight (pattern detection every TALA_INSIGHT_INTERVAL_S), replay (adaptive replay every TALA_REPLAY_INTERVAL_S), and chaos (fault injection every TALA_CHAOS_INTERVAL_S). A sixth gauge-updater thread samples internal metrics every 5 seconds.

Prometheus

prom/prometheus:v2.51.0 on port 9090

Scrapes all four simulators every 5 seconds via the tala-sim job. Retains 30 days of TSDB data. Lifecycle API is enabled for configuration reload without restart.

Config: deploy/prometheus/prometheus.yml (mounted read-only)
Data: prometheus-data named volume at /prometheus

Grafana

grafana/grafana:10.4.0 on port 3000

Pre-provisioned with a Prometheus data source and the TALA Overview dashboard. Default credentials are admin / the value of GRAFANA_PASSWORD (defaults to tala-demo).

Provisioning: deploy/grafana/provisioning/ (datasources and dashboard loader, read-only)
Dashboards: deploy/grafana/dashboards/ (JSON dashboard definitions, read-only)
Data: grafana-data named volume at /var/lib/grafana

Observatory Dashboard (nginx)

nginx:1.25-alpine on port 8080

Serves the static Observatory front-end and proxies /api/ requests to Prometheus, allowing the browser to query metrics without CORS issues.

Static files: deploy/dashboard/ mounted at /usr/share/nginx/html
Config: deploy/dashboard/nginx.conf mounted at /etc/nginx/conf.d/default.conf

Port Summary

Port	Service	Protocol
3000	Grafana	HTTP
8080	Observatory Dashboard	HTTP
9090	Prometheus	HTTP
9101	sim-incident metrics	HTTP
9102	sim-deploy metrics	HTTP
9103	sim-observe metrics	HTTP
9104	sim-provision metrics	HTTP

Volume Mounts

The stack defines six named volumes:

Volume	Used By	Mount Point	Purpose
`incident-data`	sim-incident	`/data`	WAL, segments, HNSW index
`deploy-data`	sim-deploy	`/data`	WAL, segments, HNSW index
`observe-data`	sim-observe	`/data`	WAL, segments, HNSW index
`provision-data`	sim-provision	`/data`	WAL, segments, HNSW index
`prometheus-data`	prometheus	`/prometheus`	TSDB storage
`grafana-data`	grafana	`/var/lib/grafana`	Grafana state and plugins

All simulation data volumes are independent. Removing one does not affect others.

Networking

All services join a single bridge network (tala-net). Simulators are addressable by container name from Prometheus (e.g., sim-incident:9090). The nginx proxy reaches Prometheus at prometheus:9090.

Environment Configuration

Copy .env.example to .env in the deploy/ directory and adjust values before starting. See Configuration for the full variable reference.

cp .env.example .env
# Edit .env to set ingest rates, chaos parameters, Grafana password

Scaling

To adjust ingest throughput, modify the *_RATE_MS variables in .env. Lower values produce more intents per second. Each simulator runs independently, so you can also scale by adding additional verticals to docker-compose.yml following the existing pattern.

To add a fifth simulator:

Add a new service block in docker-compose.yml using the *sim-defaults anchor
Set TALA_VERTICAL to the domain name
Map a new host port (e.g., 9105:9090)
Add a corresponding volume
Add the scrape target to prometheus/prometheus.yml

Restart and Recovery

All services use restart: unless-stopped. If a simulator crashes, Docker restarts it automatically. The WAL ensures durability across restarts when persistent storage is configured.

To restart a single service:

docker compose restart sim-incident

To restart the entire stack:

docker compose restart

Tear Down

Stop and remove containers, keeping volumes:

docker compose down

Stop, remove containers, and delete all data:

docker compose down -v

This destroys all WAL data, segment files, Prometheus TSDB, and Grafana state. Use only when you want a clean start.

Viewing Logs

# All services
docker compose logs -f

# Single service
docker compose logs -f sim-incident

# Last 100 lines
docker compose logs --tail=100 sim-deploy

Simulators log to stderr. Key log lines include the startup banner (configuration summary), periodic progress reports (every 100 intents), and chaos event announcements.

Configuration

TALA simulators are configured entirely through environment variables. In the Docker Compose deployment, these are set in the .env file (copied from deploy/.env.example). When running tala-sim directly, export them in the shell.

Environment Variables

Variable	Default	Description
`TALA_VERTICAL`	`medical`	Operational domain for the simulator. Supported values: `incident`, `deploy`, `observe`, `provision`. Legacy names (`medical`, `financial`, `ecommerce`, `gaming`) are mapped to these for backward compatibility.
`TALA_RATE_MS`	`500`	Milliseconds between intent ingestions. Lower values increase throughput. The Docker Compose file overrides this per-vertical via `INCIDENT_RATE_MS`, `DEPLOY_RATE_MS`, `OBSERVE_RATE_MS`, and `PROVISION_RATE_MS`.
`TALA_CHAOS_INTERVAL_S`	`60`	Seconds between chaos engine trigger checks. The engine sleeps for this duration between evaluations.
`TALA_CHAOS_PROBABILITY`	`0.15`	Probability (0.0 to 1.0) that a chaos event fires on each trigger check. At the default of 0.15 with a 60-second interval, expect roughly 9 events per hour.
`TALA_DATA_DIR`	`/data`	Directory for persistent storage (WAL, segments, HNSW index). Set to an empty string to run entirely in memory with no disk I/O.
`TALA_METRICS_PORT`	`9090`	TCP port for the Prometheus metrics HTTP server. Each simulator binds to `0.0.0.0:<port>` and serves `/metrics` and `/health` endpoints.
`TALA_DIM`	`384`	Embedding vector dimensionality. Must match the model used for intent extraction. The default of 384 corresponds to a lightweight sentence transformer.
`TALA_HOT_CAPACITY`	`10000`	Maximum number of intents held in the hot buffer before flushing to a TBF segment. Larger values reduce flush frequency but increase memory usage.
`TALA_QUERY_INTERVAL_S`	`10`	Seconds between semantic query probes. The query thread selects a random probe command from the vertical's vocabulary, embeds it, and searches the HNSW index for the top 10 matches.
`TALA_INSIGHT_INTERVAL_S`	`30`	Seconds between insight generation runs. Each run performs k-means clustering over recent embeddings with k=3 to detect intent patterns.
`TALA_REPLAY_INTERVAL_S`	`45`	Seconds between adaptive replay attempts. The replay thread picks a random recent intent and generates a replay plan of depth 3, traversing causal edges in the narrative graph.
`GRAFANA_PASSWORD`	`tala-demo`	Admin password for Grafana. Only used in the Docker Compose deployment. Change this before exposing Grafana to a network.

Docker Compose Overrides

The docker-compose.yml maps several top-level .env variables to per-service equivalents:

`.env` Variable	Compose Mapping	Affected Services
`INCIDENT_RATE_MS`	`TALA_RATE_MS`	sim-incident
`DEPLOY_RATE_MS`	`TALA_RATE_MS`	sim-deploy
`OBSERVE_RATE_MS`	`TALA_RATE_MS`	sim-observe
`PROVISION_RATE_MS`	`TALA_RATE_MS`	sim-provision
`CHAOS_INTERVAL_S`	`TALA_CHAOS_INTERVAL_S`	all sim-* services
`CHAOS_PROBABILITY`	`TALA_CHAOS_PROBABILITY`	all sim-* services
`GRAFANA_PASSWORD`	`GF_SECURITY_ADMIN_PASSWORD`	grafana

Tuning Guidelines

Ingest Rate

The ingest rate controls how fast intents flow into the system. At 200ms (the observe vertical default), the simulator produces 5 intents per second, or 300 per minute. At 400ms, it produces 2.5 per second.

The TALA ingest pipeline (extract, WAL append, HNSW insert, edge formation, hot buffer push) typically completes in 1-2ms, so rates down to about 5ms are sustainable on modern hardware without backpressure.

Hot Buffer Capacity

The hot buffer accumulates intents in memory and flushes to a TBF segment when full. A capacity of 10,000 with 384-dimensional embeddings uses approximately:

Intent metadata: ~200 bytes each = 2 MB
Embeddings: 384 * 4 bytes * 10,000 = 15.4 MB
Total: ~18 MB per simulator

Increase for higher throughput and fewer disk writes. Decrease for lower memory footprint and more frequent segment creation.

Chaos Parameters

To disable chaos entirely, set TALA_CHAOS_PROBABILITY=0.0. To make chaos frequent for stress testing, increase probability and decrease interval:

CHAOS_INTERVAL_S=10
CHAOS_PROBABILITY=0.5

This fires a chaos event roughly every 20 seconds.

In-Memory Mode

Set TALA_DATA_DIR="" (empty string) to bypass all disk I/O. The WAL is not created, segments are never flushed, and all data lives in the hot buffer until the process exits. Useful for benchmarking pure compute performance without filesystem variability.

The Observatory Dashboard

The TALA Intent Observatory is a topology-first dashboard that visualizes four operational domains, their TALA subsystems, and the narrative graph connecting them. It runs as a static HTML/CSS/JS application served by nginx, with Prometheus as the metrics backend.

Architecture

sim-incident :9101    sim-deploy :9102    sim-observe :9103    sim-provision :9104
       |                    |                    |                     |
       +--------------------+--------------------+---------------------+
                                    |
                             Prometheus :9090
                                    |
                    +---------------+---------------+
                    |                               |
              Grafana :3000              Observatory (nginx) :8080

Each simulator exposes a /metrics endpoint in Prometheus exposition format. Prometheus scrapes all four every 5 seconds. The Observatory dashboard queries Prometheus through an nginx reverse proxy at /api/, which forwards requests to http://prometheus:9090/api/. This eliminates CORS issues and keeps the front-end purely static.

Landing Page

The landing page presents the TALA concept: what it is, what problem it solves, why intent-native history matters. A live topology visualization in the background shows the four operational domains with animated edges, giving an immediate sense of the system's activity even before entering the dashboard.

Click Enter Observatory to transition to the full topology view.

Topology View

The main dashboard is a full-viewport topology graph with three layers of nodes.

Domain Nodes

Four outer nodes represent the operational verticals:

Node	Domain	Metrics Shown
Incident Response	`incident`	Intent count, ingest rate, patterns detected
Continuous Deployment	`deploy`	Intent count, ingest rate, patterns detected
Observability	`observe`	Intent count, ingest rate, patterns detected
Provisioning	`provision`	Intent count, ingest rate, patterns detected

Each domain node displays live counters pulled from tala_intents_ingested_total, tala_active_patterns, and rate calculations over tala_ingest_latency_us.

Capability Nodes

Four inner nodes represent the core TALA subsystems:

Node	Subsystem	What It Shows
Extract	Intent extraction pipeline	Pipeline waterfall (extract, WAL, HNSW, edge, hot push latencies)
Remember	HNSW semantic index	Index size, average nodes visited per search, capacity gauge
Persist	WAL + segment storage	WAL entry count, segments flushed, bytes flushed, hot buffer fill ratio
Connect	Causal edge formation	Edge count, relation type breakdown (Causal, Temporal, Dependency, Retry, Branch)

Narrative Layer Hub

The central node represents the aggregate narrative graph. It shows total edge count across all domains, combined intelligence metrics (total patterns, clusters, replays, insights), and lock contention health drawn from tala_lock_* metrics.

Animated Edges

Edges between nodes carry flowing particles that represent intent moving through the system. Particle speed and density reflect the current ingest rate. When chaos events fire, affected edges show visual disruption.

Detail Panels

Click any node to open a detail drawer on the right side of the viewport.

Domain Node Details

Narrative Structure: graph nodes, causal edges, connectivity ratio
What TALA Learned: patterns detected, clusters identified, replays generated, insights produced
Outcome Distribution: success/failure/partial breakdown from tala_intents_success_total, tala_intents_failure_total, tala_intents_partial_total

Capability Node Details

Extract: pipeline waterfall showing per-stage latency (extract, WAL append, HNSW insert, edge search, hot push, segment flush)
Remember: HNSW index size gauge, average search visited count, insert latency histogram
Persist: WAL entries total, segments flushed, bytes flushed, hot buffer fill ratio gauge
Connect: edge count, relation type breakdown, edge search latency

Narrative Hub Details

Aggregate metrics across all domains, plus lock contention breakdown showing acquisitions, contentions, wait time, and hold time for each lock (intents, hnsw, index_map, wal, hot).

Chaos Mode Indicator

When the chaos engine injects faults, a floating indicator appears at the bottom of the topology. It shows:

Current mode: Failure Injection, Latency Storm, Retry Cascade, Mixed Chaos, or Stampede
Event rate: chaos events per minute, calculated from tala_chaos_events_total
Affected domains: which verticals have non-zero chaos counters
Visual disruption: affected nodes pulse and edges distort

The chaos mode is inferred from which tala_chaos_* counters are incrementing. If only tala_chaos_failures_injected is rising, the mode is Failure Injection. If multiple counters are active simultaneously, the mode is Mixed Chaos.

See Chaos Engineering for details on what each chaos event does and how to tune it.

Prometheus Queries

The dashboard issues PromQL queries through the /api/ proxy. Key queries:

Metric	Query	Purpose
Ingest rate	`rate(tala_intents_ingested_total[1m])`	Intents per second per vertical
Query latency p99	`histogram_quantile(0.99, rate(tala_query_latency_us_bucket[5m]))`	Tail query latency
Chaos event rate	`rate(tala_chaos_events_total[5m])`	Chaos events per second
HNSW index size	`tala_hnsw_index_size`	Current vectors indexed
Hot buffer fill	`tala_hot_buffer_fill_ratio / 1000`	Fill ratio (0.0 to 1.0)

Accessing the Dashboard

After starting the Docker Compose stack:

# Observatory
open http://localhost:8080

# Grafana (for traditional time-series dashboards)
open http://localhost:3000

# Raw Prometheus UI
open http://localhost:9090

Chaos Engineering

TALA includes a built-in chaos engine that injects faults into running simulators. The purpose is to validate that the narrative graph, HNSW index, WAL, and storage engine handle degraded conditions gracefully, and to generate realistic failure data for the Observatory dashboard.

The ChaosEngine

The chaos engine runs in a dedicated thread per simulator. On each cycle it sleeps for TALA_CHAOS_INTERVAL_S seconds (default: 60), then rolls against TALA_CHAOS_PROBABILITY (default: 0.15). If the roll succeeds, a weighted random selection determines which fault to inject.

sleep(interval) -> roll < probability? -> select event -> execute
                         |
                         no -> loop

Event Types and Weights

Event	Weight	Description
ForcedFailure	30%	Ingests a valid command, then attaches a forced failure outcome (exit code 137, latency 999ms). Tests that the narrative graph correctly records and connects failure nodes.
LatencySpike	25%	Sleeps for `rate_ms * multiplier` (multiplier randomly chosen from 2-9). Simulates a stalled upstream dependency. The chaos thread itself blocks, modeling backpressure.
RetryStorm	20%	Ingests the same command 3-14 times in rapid succession. Tests HNSW behavior under duplicate vectors and validates that the graph forms Retry-type edges between repeated intents.
IngestFailure	10%	Attempts to ingest an empty command string, which is expected to fail. Validates that the ingest pipeline rejects malformed input without corrupting state.
DegenerateQuery	10%	Queries the HNSW index with a zero vector (all dimensions 0.0). Tests that similarity search handles degenerate inputs without panicking or returning nonsensical results.
InsightStress	5%	Runs insight generation with k=50 clusters (far above the normal k=3). Stress-tests the k-means implementation with an oversized k relative to the current corpus.

What Each Event Does

ForcedFailure

The chaos thread generates a command from the vertical's workload generator, ingests it through the normal pipeline (extract, WAL, HNSW, edge formation, hot buffer), then overwrites the outcome with:

#![allow(unused)]
fn main() {
Outcome {
    status: Status::Failure,
    latency_ns: 999_999_999,
    exit_code: 137,
}
}

This creates a real intent node in the narrative graph with a failure outcome. The edge formation system will create causal edges from this node, and subsequent insight runs will detect it as part of a failure cluster.

LatencySpike

The chaos thread sleeps for an extended duration, blocking itself. The multiplier is randomly chosen between 2 and 9, so with a default rate of 300ms, a spike lasts 600ms to 2.7 seconds. Since the chaos thread is separate from the ingest thread, normal ingestion continues unaffected. The spike is visible in the chaos latency counter (tala_chaos_latency_spikes).

RetryStorm

The same command and context are ingested between 3 and 14 times in a tight loop with no sleep between iterations. This floods the HNSW index with near-identical vectors (the embedding of the same command produces the same vector each time) and tests:

Hot buffer behavior under burst writes
WAL throughput under sequential append pressure
Edge formation's ability to detect and label retries

IngestFailure

An empty command is submitted with a synthetic context (cwd: /chaos, user: chaos). The ingest pipeline should reject this at the extraction stage. The chaos engine expects and silently handles the error. This validates that error paths do not corrupt the WAL, HNSW index, or hot buffer.

DegenerateQuery

A zero vector (all 0.0 values) is submitted as a semantic query for the top 10 results. A well-implemented cosine similarity function returns 0.0 for any comparison against the zero vector (since the norm is zero). HNSW should handle this without division-by-zero panics.

InsightStress

The insight system runs k-means clustering with k=50 instead of the normal k=3. When the corpus has fewer than 50 intents, some clusters will be empty. When the corpus is large, this exercises the full k-means convergence loop and memory allocation for 50 centroids of dimension 384.

Metrics

The chaos engine exposes the following Prometheus counters, all labeled by vertical:

Metric	Description
`tala_chaos_events_total`	Total chaos events triggered (all types)
`tala_chaos_failures_injected`	ForcedFailure events
`tala_chaos_latency_spikes`	LatencySpike events
`tala_chaos_retries_injected`	RetryStorm events

IngestFailure, DegenerateQuery, and InsightStress increment only tala_chaos_events_total.

Observatory Detection

The Observatory dashboard detects active chaos by watching the rate of tala_chaos_events_total per vertical. When the rate exceeds zero, it classifies the chaos mode based on which sub-counters are moving:

Condition	Displayed Mode
Only `failures_injected` rising	Failure Injection
Only `latency_spikes` rising	Latency Storm
Only `retries_injected` rising	Retry Cascade
Multiple counters rising	Mixed Chaos
Very high total event rate	Stampede

Affected domain nodes pulse visually and edges show distortion to make chaos immediately visible in the topology.

Tuning Chaos

Disable Entirely

TALA_CHAOS_PROBABILITY=0.0

The chaos thread still runs but never triggers. Negligible overhead.

High-Frequency Stress

TALA_CHAOS_INTERVAL_S=5
TALA_CHAOS_PROBABILITY=0.8

Fires a chaos event roughly every 6 seconds. Useful for stress-testing the storage engine and observing how the narrative graph handles sustained fault injection.

Failure-Heavy Profile

Chaos weights are compiled into the ChaosEngine. To change the distribution, modify the thresholds in crates/tala-sim/src/chaos.rs in the maybe_trigger method. The current boundaries:

0.00 - 0.30  ForcedFailure    (30%)
0.30 - 0.55  LatencySpike     (25%)
0.55 - 0.75  RetryStorm       (20%)
0.75 - 0.85  IngestFailure    (10%)
0.85 - 0.95  DegenerateQuery  (10%)
0.95 - 1.00  InsightStress    (5%)

Adjust the threshold constants and rebuild to shift the distribution.

Benchmark Targets

TALA defines quantitative performance targets in spec-01 (binary format) and spec-02 (embedding acceleration). These are validated by Criterion benchmark suites in each crate. Targets that pass are guarded against regression; targets that do not yet pass have identified remediation paths.

Results Summary

All measurements taken on x86-64 with AVX2+FMA, 384-dimensional f32 embeddings unless otherwise noted.

Operation	Spec Target	Measured	Status
Cosine similarity (dim=384, AVX2)	< 20ns	39.6ns	Needs AVX-512
Batch cosine (1K, single thread)	< 50us	41.7us	PASS
Batch cosine (100K, parallel)	< 5ms	3.41ms	PASS
HNSW search (10K, ef=50, top-10)	< 1ms	139us	PASS
Columnar scan (100K timestamps)	--	48us	Baseline
CSR traverse (10K lookups)	--	21.7us	Baseline
Bloom lookup (1K queries)	--	27us	Baseline
Full segment write (1K nodes)	--	209us	Baseline
WAL append (1K entries, dim=384)	--	730us	Baseline
Semantic query (10K corpus, top-10)	< 50ms	151us	PASS
Full ingest pipeline (1K)	--	1.10ms	Baseline

Status Definitions

PASS: measured value meets or exceeds the spec target. Regressions on passing targets are merge blockers.
Baseline: no spec target defined. The measured value establishes a baseline for regression detection.
Needs AVX-512: the target was set assuming AVX-512 SIMD width. Current implementation uses AVX2 (256-bit), which processes 8 floats per cycle instead of 16. The 2x gap is expected.

Detailed Breakdown

Cosine Similarity (Single Pair)

Measures the wall-clock time to compute cosine similarity between two 384-dimensional f32 vectors using the AVX2+FMA kernel.

Target: < 20ns (spec-02, assuming AVX-512 at 512-bit width)
Measured: 39.6ns on AVX2 (256-bit width)
Analysis: The AVX2 inner loop processes 8 floats per iteration (48 iterations for dim=384). AVX-512 would process 16 floats per iteration (24 iterations), roughly halving cycle count. The 39.6ns / 20ns ratio aligns with the 2x throughput difference.

Batch Cosine (1K Vectors, Single Thread)

Computes cosine similarity between one query vector and 1,000 corpus vectors sequentially on a single core.

Target: < 50us
Measured: 41.7us
Analysis: 41.7ns per vector pair, consistent with single-pair measurements. Memory-bandwidth limited at this scale since the corpus fits in L2 cache.

Batch Cosine (100K Vectors, Parallel)

Computes cosine similarity between one query vector and 100,000 corpus vectors using Rayon parallel iterators across all available cores.

Target: < 5ms
Measured: 3.41ms
Analysis: Linear scaling from the single-thread case would predict ~4.17ms. The measured value of 3.41ms shows effective parallelization with minimal scheduling overhead.

HNSW Search

Searches a 10,000-vector HNSW index for the top 10 nearest neighbors with ef=50 (search beam width).

Target: < 1ms
Measured: 139us
Analysis: 7x headroom below target. The HNSW implementation uses M=16 (connections per layer) and ef_construction=200. Average nodes visited per search is tracked via tala_hnsw_avg_search_visited in the metrics.

Columnar Scan (100K Timestamps)

Scans the timestamp column of a 100K-node columnar buffer to filter by time range.

Measured: 48us
Analysis: Sequential scan over contiguous u64 values. Benefits from hardware prefetch and cache-line alignment. Establishes the baseline for time-range queries over TBF segments.

CSR Traverse (10K Lookups)

Performs 10,000 adjacency lookups in a Compressed Sparse Row index, retrieving the edge list for each source node.

Measured: 21.7us
Analysis: ~2.17ns per lookup. CSR provides O(1) access to the start of each node's edge list via the row pointer array, then sequential scan over edges.

Bloom Lookup (1K Queries)

Tests 1,000 UUIDs against a Bloom filter for membership.

Measured: 27us
Analysis: ~27ns per query. Multiple hash functions (k=7) are computed per probe. False positive rate depends on filter sizing relative to the number of inserted elements.

Full Segment Write (1K Nodes)

Serializes 1,000 intent nodes with embeddings, edges, and metadata into a complete TBF segment via SegmentWriter.

Measured: 209us
Analysis: Includes columnar encoding of node payloads, 64-byte-aligned embedding packing, CSR edge construction, Bloom filter population, and CRC32C checksum computation.

WAL Append (1K Entries)

Appends 1,000 intent entries (each with a 384-dimensional embedding) to the write-ahead log.

Measured: 730us
Analysis: ~730ns per entry. Each append serializes the intent, writes to the log file with fsync semantics, and updates the entry counter.

Semantic Query (10K Corpus, Top-10)

End-to-end semantic search: given a query embedding, search the HNSW index over 10K stored intents and return the top 10 results with metadata.

Target: < 50ms
Measured: 151us
Analysis: 330x headroom. The query path is HNSW search (139us) plus metadata lookup for the 10 result IDs (~12us).

Full Ingest Pipeline (1K Intents)

End-to-end ingestion of 1,000 intents through the complete pipeline: extraction, WAL append, HNSW insert, edge formation, and hot buffer push.

Measured: 1.10ms
Analysis: ~1.1us per intent across all pipeline stages. Edge formation dominates at scale due to O(n^2) nearest-neighbor search for edge candidates.

Regression Policy

Performance regressions on passing targets are merge blockers. Run cargo bench before merging any change that touches hot-path code in tala-embed, tala-wire, tala-store, or tala-graph.

Baseline measurements should not regress by more than 20% without investigation. Criterion's built-in comparison detects statistically significant changes.

Running Benchmarks

TALA uses Criterion for all benchmarks. Each crate that contains performance-critical code has a benchmark suite in its benches/ directory. Criterion produces stable, statistically rigorous measurements with HTML reports.

Prerequisites

Rust toolchain as pinned in rust-toolchain.toml
A machine with AVX2 support for SIMD benchmarks (most x86-64 CPUs from 2013+)
Close other workloads to reduce measurement noise

Running All Benchmarks

From the workspace root:

cargo bench

This runs every benchmark suite across all crates. Expect 2-5 minutes depending on hardware. Results are printed to the terminal and saved as HTML reports.

Per-Crate Benchmarks

Run benchmarks for a single crate:

# Embedding engine (cosine, batch, HNSW)
cargo bench --bench embed_bench

# Wire format (columnar, CSR, Bloom, segment)
cargo bench --bench wire_bench

# Graph engine
cargo bench --bench graph_bench

# Storage engine (WAL, hot buffer, query, ingest pipeline)
cargo bench --bench store_bench

Filtering Benchmarks

Criterion accepts a filter argument to run a subset of benchmarks by name:

# Only cosine similarity benchmarks
cargo bench --bench embed_bench -- cosine

# Only HNSW benchmarks
cargo bench --bench embed_bench -- hnsw

# Only WAL benchmarks
cargo bench --bench store_bench -- wal

HTML Reports

Criterion generates HTML reports in target/criterion/. After running benchmarks:

# Open the report index
open target/criterion/report/index.html

Each benchmark group has its own page with:

Time distribution plot (violin or PDF)
Iteration time scatter plot
Comparison against the previous run (if available)
Statistical summary (mean, median, standard deviation, confidence interval)

Interpreting Results

Terminal Output

Criterion prints a summary for each benchmark:

cosine_similarity/avx2/384
                        time:   [39.2 ns 39.6 ns 40.1 ns]
                        change: [-0.5% +0.3% +1.1%] (p = 0.42 > 0.05)
                        No change in performance detected.

The three values in brackets are the lower bound, point estimate, and upper bound of the 95% confidence interval.
The change line compares against the previous run. If the change is statistically significant (p < 0.05), Criterion reports "Performance has regressed" or "Performance has improved."

Comparing Against Targets

After running benchmarks, compare measured values against the targets in Benchmark Targets:

Operation	Target	What to Check
Cosine similarity	< 20ns	`cosine_similarity/avx2/384`
Batch cosine 1K	< 50us	`batch_cosine/1000`
Batch cosine 100K	< 5ms	`batch_cosine_parallel/100000`
HNSW search	< 1ms	`hnsw_search/10000`
Semantic query	< 50ms	`semantic_query/10000`

Any passing target that regresses beyond its threshold is a merge blocker.

Stable Measurements

For reliable results:

Disable CPU frequency scaling: set the governor to performance mode.
```
sudo cpupower frequency-set -g performance
```

Pin to a single NUMA node (multi-socket systems):

numactl --cpunodebind=0 --membind=0 cargo bench

Warm up: Criterion automatically runs warmup iterations. The default configuration is sufficient for most benchmarks.
Multiple runs: if a result is borderline, run the benchmark three times and take the median.

Adding New Benchmarks

Benchmarks live in crates/<name>/benches/<name>_bench.rs. Follow this structure:

#![allow(unused)]
fn main() {
use criterion::{criterion_group, criterion_main, Criterion};

fn bench_operation(c: &mut Criterion) {
    let mut group = c.benchmark_group("operation_name");
    // setup
    group.bench_function("variant", |b| {
        b.iter(|| {
            // code under measurement
        })
    });
    group.finish();
}

criterion_group!(benches, bench_operation);
criterion_main!(benches);
}

[[bench]]
name = "<name>_bench"
harness = false

Use harness = false to let Criterion control the benchmark harness. Use Throughput annotations for operations measured in elements per second.

Design Decisions

These ten decisions are settled. They define the architectural boundaries of TALA and are not subject to re-evaluation.

1. Intent Is a First-Class Primitive

Decision: An intent is not a wrapper around a command string. It is a structured representation of a desired outcome: I = f(C, X, P), where C is the command, X is the execution context, and P is prior knowledge from historical embeddings.

Reasoning: Command strings are lossy. kubectl rollout restart deployment/api carries no information about why the restart was performed, what state the cluster was in, or what outcome was expected. By modeling intent as a composite of command, context, embedding, and outcome, TALA captures the full semantic surface. This enables semantic recall, pattern detection, and adaptive replay -- none of which are possible over raw strings.

Alternative rejected: Wrapping commands with metadata annotations (like shell plugins that tag history entries). This approach is incremental but does not change the data model. You still have a flat list with decorations. TALA requires a graph-native data model where intent is the node, not the string.

2. DAG, Not Tree

Decision: The narrative graph is a directed acyclic graph with probabilistic edges, not a tree.

Reasoning: Real workflows are not hierarchical. A single intent can have multiple causes (a failed deploy and an alert firing together trigger an incident response). A single intent can feed multiple downstream actions (a successful build triggers both a deploy and a notification). Trees cannot represent shared causality or fan-out without duplication. DAGs can.

Alternative rejected: Tree-structured history (like process trees in an OS). Trees force a single-parent constraint that does not reflect how human workflows actually branch and merge. Undo trees (like vim's undo) capture branching but not convergence.

3. Binary-First Storage

Decision: TALA uses a custom binary format (TBF) for on-disk segments. No JSON, JSONL, CSV, or text-based serialization anywhere in the storage path.

Reasoning: See Why Not JSON for the full argument. The short version: JSON requires parsing on every read, cannot be memory-mapped for zero-copy access, wastes bytes on field names and delimiters, and cannot be aligned for SIMD operations. TBF is a columnar + CSR hybrid designed for TALA's specific access patterns.

Alternative rejected: JSON Lines for human readability, Parquet for columnar analytics, FlatBuffers/Cap'n Proto for zero-copy. Each of these serves a different primary use case. None provides the combination of columnar embedding access, CSR graph traversal, Bloom membership testing, and 64-byte alignment that TALA requires.

4. 64-Byte Alignment

Decision: All embedding storage is aligned to 64-byte boundaries, both in memory (AlignedVec) and on disk (TBF embedding region).

Reasoning: 64 bytes is the cache line size on x86-64 and the natural alignment for AVX-512 (512-bit = 64-byte vectors). Even when running AVX2 (256-bit = 32-byte), 64-byte alignment ensures that loads never split across cache lines and that the transition to AVX-512 requires no storage format changes.

Alternative rejected: 32-byte alignment (sufficient for AVX2) or no alignment (rely on unaligned loads). Unaligned loads incur a penalty on some microarchitectures and prevent the use of aligned load instructions. 32-byte alignment would require a format version bump when AVX-512 is adopted.

5. SIMD with Runtime Dispatch

Decision: Compile all ISA variants (scalar, SSE4.1, AVX2+FMA, AVX-512, NEON, SVE) and select at startup based on cpuid / feature detection.

Reasoning: A single TALA binary must run on any x86-64 machine from 2008 (SSE4.1) to current (AVX-512). Compile-time selection via target-cpu=native produces binaries that crash on older hardware. Runtime dispatch via function pointers initialized once at startup adds zero per-call overhead (the pointer is resolved once, then called directly).

Alternative rejected: Compile-time CPU targeting (RUSTFLAGS="-C target-cpu=native"). This produces the fastest binary for one specific machine but requires separate builds per microarchitecture. Unacceptable for distributed deployment.

6. Trait Boundaries Between Crates

Decision: Inter-crate communication happens through traits defined in tala-core. No crate imports concrete types from a sibling crate in its public API.

Reasoning: Trait boundaries enforce the dependency graph at the type level. tala-store depends on tala-core::IntentStore (trait), not on tala-graph::NarrativeGraph (concrete type). This means tala-store can be compiled and tested without tala-graph. It means alternative implementations can be swapped in for testing (mock stores, deterministic clocks). It means the crate graph remains a DAG even as the system grows.

Alternative rejected: Direct struct imports between sibling crates. This creates tight coupling and eventually produces dependency cycles that Cargo rejects. Trait boundaries prevent this structurally.

7. Append-Only Segments

Decision: TBF segments are immutable once flushed from the hot buffer. No in-place mutation. The WAL provides durability for data not yet flushed.

Reasoning: Immutable segments enable lock-free reads via memory mapping. Multiple readers can mmap the same segment file without coordination. Compaction (merging small segments into larger ones) produces new segments and deletes old ones atomically. This is the same model used by LSM-tree storage engines (RocksDB, LevelDB, Cassandra) for good reason: it separates the write path (WAL + hot buffer) from the read path (immutable segments) and eliminates read-write contention.

Alternative rejected: Mutable segments with in-place updates. This requires page-level locking, complicates crash recovery (partial writes), and prevents safe concurrent mmap reads.

8. Tokio for Async

Decision: The daemon (tala-daemon) and network (tala-net) crates use the Tokio async runtime. Pure compute crates (tala-embed, tala-wire, tala-graph) are synchronous.

Reasoning: The daemon is I/O-bound (accepting connections, reading WAL, flushing segments, network replication). Tokio is the standard Rust async runtime for I/O-heavy workloads. Compute crates are CPU-bound (SIMD similarity, graph traversal, serialization). Making them async would add unnecessary overhead from task scheduling and .await points in tight loops. The boundary is clean: compute crates expose synchronous APIs, and the daemon wraps them in spawn_blocking or dedicated threads.

Alternative rejected: Fully synchronous design (thread-per-connection). This limits concurrency under high connection counts. Also rejected: fully async design (async everywhere). This forces compute-heavy code onto the async executor, starving I/O tasks.

9. Benchmark-Driven Development

Decision: Algorithms are proven in Criterion benchmarks before being integrated into the production code path.

Reasoning: TALA's value proposition depends on performance. If HNSW search takes 10ms instead of 139us, semantic recall becomes impractical for interactive use. By benchmarking first, we establish that an algorithm meets its target before building the system around it. Regressions on passing benchmarks are merge blockers, enforcing a performance floor that cannot silently degrade.

Alternative rejected: Implement first, optimize later. This approach accumulates performance debt and makes it difficult to identify which change caused a regression. Benchmark-first development catches problems at the point of introduction.

10. Spec-First

Decision: Specifications precede implementation. Every subsystem has a spec that defines its data structures, requirements (using RFC 2119 language), and performance targets before code is written.

Reasoning: Specs force design decisions to be made explicitly and documented permanently. They create a shared vocabulary between contributors. They make it possible to review a design without reading code. The spec dependency graph (spec-01 and spec-02 are standalone; spec-03 references both; spec-04 references spec-01 and spec-03) mirrors the crate dependency graph, ensuring architectural consistency.

Alternative rejected: Code-first development with documentation after the fact. This produces implementations that encode implicit design decisions, making it difficult for new contributors to understand why things are the way they are.

Why Not JSON

TALA uses a custom binary format (TBF) instead of JSON, JSONL, or any text-based serialization. This is not a preference; it is a performance and architectural requirement.

The Problem with JSON for TALA's Workload

TALA stores intent nodes that contain 384-dimensional f32 embedding vectors, graph edges with typed relations and weights, timestamps at nanosecond resolution, and outcome metadata. A typical intent node serialized as JSON looks like this:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp_ns": 1711234567890123456,
  "command": "kubectl rollout restart deployment/api",
  "embedding": [0.0234, -0.1567, 0.3891, ...],
  "outcome": {
    "status": "Success",
    "latency_ns": 234567,
    "exit_code": 0
  }
}

The embedding field alone -- 384 floats as decimal strings -- consumes approximately 3,500 bytes in JSON. The same data in binary is 1,536 bytes (384 * 4 bytes per f32). JSON spends 2.3x more bytes on the embedding field than the data itself occupies.

Multiply by 100,000 nodes in a segment, and the difference is 335 MB (JSON) vs. 147 MB (binary) for embeddings alone.

Access Pattern Mismatch

TALA's query path has three distinct access patterns, none of which JSON serves well.

Columnar Embedding Scan

Semantic search computes cosine similarity between a query vector and every embedding in a segment. This requires streaming through the embedding column contiguously. In TBF, the embedding region is a flat array of 64-byte-aligned f32 vectors, laid out consecutively:

[embedding_0][embedding_1][embedding_2]...

The CPU can prefetch the next cache line while processing the current one. SIMD instructions load 8 (AVX2) or 16 (AVX-512) floats at a time from aligned addresses.

In JSON, embeddings are interleaved with metadata fields, separated by field names, brackets, and commas. To scan embeddings, you must parse the entire document for every node, skipping fields you do not need. There is no way to jump to "the embedding of node N" without parsing everything before it.

Graph Traversal

Traversing the narrative graph requires looking up all edges from a given source node. TBF uses Compressed Sparse Row (CSR) format, which stores edges sorted by source and provides an O(1) lookup to the start of any node's edge list via a row pointer array.

In JSON, edges would be either nested within node objects (requiring full document parse to extract) or stored as a separate array (requiring linear scan to find edges for a given source).

Membership Testing

Before querying a segment, TALA checks whether a target intent ID exists in it using a Bloom filter. This is a constant-time probabilistic check that avoids scanning segments that cannot contain the target.

JSON has no equivalent. You would need to parse the entire file or maintain a separate index, which negates the simplicity argument for JSON.

Zero-Copy Reads

TBF segments are designed for memory-mapped access. The mmap system call maps the file directly into the process address space. The embedding region, with its 64-byte alignment, can be accessed as a native &[f32] slice without any copying or parsing:

#![allow(unused)]
fn main() {
// Zero-copy: pointer into mmap'd file
let embedding: &[f32] = embedding_reader.get(node_index);
}

JSON requires allocating memory, parsing the text, converting decimal strings to binary floats, and storing the result. For a 100K-node segment with dim=384, this parse step alone takes tens of milliseconds. The TBF equivalent takes zero time -- the data is already in the correct binary layout in the kernel page cache.

SIMD Alignment

AVX2 operates on 256-bit (32-byte) registers. AVX-512 operates on 512-bit (64-byte) registers. Aligned load instructions (_mm256_load_ps, _mm512_load_ps) require the source address to be aligned to the register width. Unaligned loads (_mm256_loadu_ps) work on any address but may incur a performance penalty when crossing cache line boundaries.

TBF guarantees 64-byte alignment for every embedding vector. This means aligned loads work for both AVX2 and AVX-512 without runtime checks.

JSON produces f32 arrays at whatever alignment Vec<f32>::new() provides (typically 8-byte on 64-bit systems). To use aligned SIMD loads, you must copy the data into an aligned buffer -- an extra allocation and memcpy for every access.

The Cost of Parsing

The fundamental issue is that JSON encodes numbers as decimal text. Converting the string "0.0234" to the IEEE 754 float 0x3BBFCD36 requires:

Scanning for the decimal point
Parsing the integer part
Parsing the fractional part
Combining with appropriate scaling
Rounding to nearest representable float

This is approximately 20-50ns per float. For a 384-dimensional vector, that is 7-19us per embedding. For 100K embeddings, that is 700ms-1.9s just for float parsing.

TBF cost for the same operation: zero. The bytes in the file are already IEEE 754 floats. Memory mapping makes them available as native values.

What JSON Gets Right

JSON is human-readable, widely supported, and trivially debuggable. For configuration files, API responses, and interchange formats, it is an excellent choice.

TALA does not need human readability in its storage path. No human reads 100,000 embedding vectors. The debugging tools (tala-cli) decode TBF segments into human-readable output when inspection is needed.

Why Not Parquet / Arrow / FlatBuffers

Parquet: designed for analytics (column scans, predicate pushdown, compression). Does not support graph adjacency structures. Adds a heavyweight dependency.
Arrow: an in-memory columnar format with excellent SIMD support, but designed for tabular data. No native graph representation. The Arrow IPC format could carry embeddings but not CSR edges or Bloom filters.
FlatBuffers / Cap'n Proto: zero-copy serialization frameworks with strong cross-language support. Could serve as the embedding and metadata format, but do not provide CSR graph indexing, Bloom filters, or the specific columnar layout TALA needs. Adopting them would mean building TALA's graph and index structures on top, gaining little.

TBF is purpose-built for TALA's specific combination of columnar embeddings, CSR graph edges, Bloom membership, and B+ tree indexing. It trades generality for a format that is exactly right for this workload.

Intent vs. Command

Shell history records commands. TALA records intents. This distinction is not cosmetic -- it changes what the system can reason about.

What a Command Captures

A command is a string submitted to a shell:

kubectl rollout restart deployment/api

This is what traditional history records. The information content is:

The executable invoked (kubectl)
The arguments passed (rollout restart deployment/api)
The timestamp (if HISTTIMEFORMAT is set)

That is all. Everything else -- why the command was run, what state the system was in, what happened next, whether it worked -- is lost.

What an Intent Captures

TALA records the same action as a structured intent node:

#![allow(unused)]
fn main() {
Intent {
    id: IntentId("a3f1c9..."),
    timestamp_ns: 1711234567890123456,
    command: "kubectl rollout restart deployment/api",
    context: Context {
        cwd: "/home/ops/infra",
        shell: "zsh",
        user: "ops",
        env_hash: 0x4a2b...,
        session_id: 7,
    },
    embedding: [0.0234, -0.1567, 0.3891, ...],  // 384 dimensions
    outcome: Outcome {
        status: Success,
        latency_ns: 2_340_000_000,
        exit_code: 0,
    },
}
}

Plus edges connecting this intent to other intents:

#![allow(unused)]
fn main() {
Edge { from: "a3f1c9...", to: "7b2e4a...", relation: Causal, weight: 0.87 }
Edge { from: "a3f1c9...", to: "d4f1a2...", relation: Temporal, weight: 1.0 }
Edge { from: "a3f1c9...", to: "e8c3b1...", relation: Retry, weight: 0.92 }
}

The information content is fundamentally richer:

Semantic embedding: a vector representation of the command's meaning, enabling similarity search
Execution context: working directory, shell, user, environment state
Outcome: whether it succeeded, how long it took, what exit code it produced
Causal links: what caused this action and what it caused
Relation types: whether an edge represents causation, temporal sequence, dependency, retry, or branching

The Same Command, Different Intents

A critical insight: the same command string can represent completely different intents depending on context. Consider kubectl rollout restart deployment/api in three scenarios.

Scenario 1: Routine Deployment

The deployment pipeline runs on schedule. The restart is part of a normal rollout after a new image was pushed.

Context: cwd is /ci/deploy, user is ci-bot, session is a CI pipeline
Causal parent: docker push api:v2.3.1 (the image build that triggered deployment)
Outcome: Success, 12s latency
Embedding: clusters with other routine deployment commands

Scenario 2: Incident Response

An engineer restarts the API deployment at 3am because pods are OOMKilling.

Context: cwd is /home/ops, user is oncall-eng, session is an SSH session
Causal parent: kubectl get pods -n production (the diagnostic command that revealed the problem)
Outcome: Success, 45s latency (cluster under memory pressure)
Embedding: clusters with other incident response commands

Scenario 3: Failed Retry

The same restart is attempted after a first attempt timed out. The cluster's API server is unreachable.

Context: cwd is /home/ops, user is oncall-eng, same session as Scenario 2
Causal parent: the previous failed restart attempt
Edge type to parent: Retry (not Causal)
Outcome: Failure, 30s latency, exit code 1
Embedding: same vector as Scenario 2, but the intent graph structure differs

Traditional history records three identical lines. TALA records three distinct intent nodes with different contexts, different causal chains, different outcomes, and different positions in the narrative graph. The semantic recall system can distinguish between "routine deploys" and "emergency restarts" because the embeddings cluster differently based on surrounding context.

What This Enables

Semantic Recall

Search by meaning instead of string matching. "What did I do last time the API was OOMKilling?" retrieves the Scenario 2 intent cluster, not the Scenario 1 routine deploys, because the query embedding is closer to incident-response intents than to CI pipeline intents.

With history | grep kubectl, you get all three scenarios mixed together with no way to distinguish them.

Adaptive Replay

Given a past workflow (a narrative subgraph), replay it in the current context. The replay engine traverses the causal edges from a root intent, finds all downstream intents, and generates a plan adjusted for the current state. This is only possible because the graph encodes which actions caused which other actions.

With history, you would need to manually identify which commands were related, determine their order, and hope the context has not changed.

Pattern Detection

K-means clustering over intent embeddings reveals recurring patterns: "every Thursday at 2pm, there is a cluster of incident-response intents targeting the payment service." This detection operates over the semantic content (embeddings) and temporal structure (timestamps and causal edges), both of which are absent from traditional history.

Failure Correlation

When an intent has a failure outcome, TALA can traverse its causal ancestors to identify what led to the failure, and traverse its causal descendants to identify what was affected by it. This is graph traversal over typed edges -- a fundamentally different operation than searching a flat log.

The Representation Cost

Capturing intents instead of commands costs more storage per record. A command string is 50-200 bytes. An intent with a 384-dimensional embedding is approximately 1,700 bytes (1,536 for the embedding, plus metadata). This is a 10-30x increase per record.

The trade-off is justified for two reasons:

Human interaction rates are low. A busy engineer generates perhaps 500 commands per day. At 1,700 bytes each, that is 850 KB/day, 310 MB/year. Storage is not a constraint.
The value of structured data compounds. A flat history of 10,000 commands is marginally more useful than a history of 1,000 commands -- you can grep for more things. A narrative graph of 10,000 intents is qualitatively more useful than 1,000 -- the pattern detection, causal analysis, and replay capabilities improve as the graph grows and more connections emerge.

Summary

Property	Command (history)	Intent (TALA)
Data model	String	Structured node in a DAG
Context	None	Working directory, shell, user, environment
Meaning	Literal text	Embedding vector in semantic space
Outcome	Not recorded	Status, latency, exit code
Relations	Sequential order	Causal, temporal, dependency, retry, branch
Search	Regex over strings	Nearest-neighbor in embedding space
Replay	Manual copy-paste	Automated, context-aware
Pattern detection	Not possible	Clustering over embeddings and graph structure

Development Workflow

TALA follows a spec-first, benchmark-driven development process. Code changes are validated against both compilation and performance. Spec changes are validated against structural rules and cross-reference integrity.

Modifying Crate Code

Every code change follows this sequence:

Read the relevant spec. The crate-to-spec mapping:

Crate	Governing Spec
tala-core	spec-03 (crate layout)
tala-wire	spec-01 (binary format)
tala-embed	spec-02 (embedding/SIMD)
tala-graph	spec-03 (crate layout)
tala-store	spec-03 (crate layout)
tala-net	spec-04 (distributed)

Make the change.
Verify compilation across the workspace:
```
cargo check --workspace
```
This catches type errors, missing imports, and dependency issues across all crates. It is significantly faster than cargo build because it skips code generation.
Run tests:
```
cargo test --workspace
```
Run the crate's benchmarks to check for regressions:
```
cargo bench --bench <crate>_bench
```
Criterion will report whether performance has changed relative to the previous run. Any statistically significant regression on a passing target (see Benchmark Targets) is a merge blocker.
Run the full benchmark suite if the change touches hot-path code:
```
cargo bench
```

Modifying a Spec

Specs have a dependency graph. Changes to a lower-numbered spec can break higher-numbered specs that reference it.

Read the spec being modified and all specs that depend on it:

spec-01 (Binary Format)     <- standalone
spec-02 (Embedding/SIMD)    <- standalone
spec-03 (Crate Layout)      <- depends on spec-01, spec-02
spec-04 (Distributed)       <- depends on spec-01, spec-03

Modifying spec-01 requires reading spec-03 and spec-04. Modifying spec-02 requires reading spec-03. Modifying spec-03 requires reading spec-04.

Make the change following the spec writing rules:
- RFC 2119 language for requirements (MUST, SHOULD, MAY)
- Rust struct notation with #[repr(C)] where applicable
- Quantitative performance targets with hardware assumptions
- No TBD, TODO, or placeholders
- Cross-references use spec-NN format
Check downstream specs for broken references. Search for spec-NN (where NN is the modified spec number) in all higher-numbered specs. Verify that every reference still points to a valid section.
Verify the invariant: lower-numbered specs MUST NOT reference higher-numbered specs. If you find yourself wanting spec-01 to reference spec-03, the dependency is inverted and the design needs rethinking.

Adding a New Crate

Identify where the crate sits in the dependency graph:

                    tala-core
                   /    |    \
             tala-wire  tala-embed  (no inter-dep)
                |    \    /   |
                |  tala-store |
                |      |      |
           tala-graph  |  tala-intent
             /    \    |    /
      tala-weave  tala-kai
                \   |   /
              tala-net
                  |
            tala-daemon
                  |
             tala-cli

Create the crate:
```
cargo init --lib crates/tala-<name>
```

Configure Cargo.toml:

[package]
name = "tala-<name>"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true

[dependencies]
tala-core = { path = "../tala-core" }
# Add other dependencies using workspace versions:
# uuid = { workspace = true }

Add to the workspace root Cargo.toml:

[workspace]
members = [
    # ...existing crates...
    "crates/tala-<name>",
]

Add a benchmark suite if the crate contains performance-critical code:
```
mkdir crates/tala-<name>/benches
```
Create crates/tala-<name>/benches/<name>_bench.rs with Criterion benchmarks and register it in Cargo.toml with harness = false.
Verify the workspace builds:
```
cargo check --workspace
```

Adding a New Spec

Identify the gap: what concept is not yet covered by specs 01 through 04?
Determine which specs the new one depends on. The new spec number MUST be higher than all its dependencies.
Draft the spec following the structure: Overview, Data Structures, Requirements, Performance Targets.
Verify no circular references: lower specs MUST NOT reference higher specs.
Update CLAUDE.md with the new spec in the Spec Suite table and dependency graph.

Review Checklist

Before merging any change:

cargo check --workspace passes
cargo test --workspace passes
No performance regressions on passing benchmark targets
Spec cross-references are valid (if specs were modified)
No unwrap() or expect() in library code
unsafe blocks have // SAFETY: comments
New public types follow naming conventions (PascalCase types, snake_case fields)

Rust Conventions

These conventions apply to all Rust code in the crates/ directory. They are enforced by review and, where possible, by compiler checks.

Workspace Dependencies

All shared dependency versions are declared in the workspace root Cargo.toml under [workspace.dependencies]:

[workspace.dependencies]
uuid = { version = "1", features = ["v4"] }
thiserror = "2"
rand = { version = "0.8", features = ["small_rng"] }
rayon = "1"
criterion = { version = "0.5", features = ["html_reports"] }

Individual crates reference these with { workspace = true }:

[dependencies]
uuid = { workspace = true }

[dev-dependencies]
criterion = { workspace = true }

Never specify a version directly in a crate's Cargo.toml if the dependency exists in the workspace table. This prevents version skew across crates.

`#[repr(C)]` Rules

Apply #[repr(C)] to every struct that crosses an FFI boundary, is memory-mapped from disk, or is serialized by reinterpreting raw bytes:

#![allow(unused)]
fn main() {
#[repr(C)]
pub struct TbfHeader {
    pub magic: u32,
    pub version_major: u16,
    pub version_minor: u16,
    pub segment_id: u64,
    // ...
}
}

Use #[repr(C, packed)] only when padding must be eliminated (e.g., wire protocol headers where every byte position is specified). Packed structs require unaligned access, so reads must go through ptr::read_unaligned.

Do not use #[repr(C)] on purely internal structs that never leave Rust's type system. Let the compiler optimize their layout.

Unsafe Code and SAFETY Comments

Every unsafe block requires a // SAFETY: comment on the line immediately above it. The comment must explain the invariant that makes the code sound -- not what the code does, but why it is safe to do it:

#![allow(unused)]
fn main() {
// SAFETY: `AlignedVec` guarantees 64-byte alignment and `a.len() == b.len()`
// is checked by the caller. The pointer arithmetic stays within the allocation.
unsafe {
    let va = _mm256_load_ps(a.as_ptr().add(i));
    let vb = _mm256_load_ps(b.as_ptr().add(i));
    // ...
}
}

Bad SAFETY comments (do not do this):

#![allow(unused)]
fn main() {
// SAFETY: we need to call this intrinsic
unsafe { ... }
}

If you cannot articulate why an unsafe block is sound, it probably is not.

SIMD Architecture Gating

SIMD code lives in architecture-gated modules:

#![allow(unused)]
fn main() {
#[cfg(target_arch = "x86_64")]
pub mod avx2 {
    use std::arch::x86_64::*;

    #[target_feature(enable = "avx2,fma")]
    pub unsafe fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
        // ...
    }
}

#[cfg(target_arch = "aarch64")]
pub mod neon {
    // ...
}

pub mod scalar {
    pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
        // Always available, always correct
    }
}
}

A scalar fallback MUST exist for every SIMD function. The scalar implementation is the correctness reference -- SIMD variants are tested against it.

Use #[target_feature(enable = "avx2,fma")] on SIMD functions. This allows the compiler to emit AVX2+FMA instructions within the function body even when the global target does not include them. The function must be called through a function pointer or via is_x86_feature_detected! guard.

Use _mm256_loadu_ps (unaligned load) by default. Use _mm256_load_ps (aligned load) only when alignment is guaranteed by AlignedVec or TBF segment layout.

Inlining

Apply #[inline] to functions called in hot loops:

Cosine similarity and other vector operations
Columnar accessor functions (timestamp lookup, embedding read)
Hash functions in Bloom filter probes
CSR row pointer lookups

Do not apply #[inline] to cold paths (error handling, configuration parsing, initialization). Unnecessary inlining increases binary size and instruction cache pressure.

Use #[inline(always)] only when benchmarks prove it matters. In most cases, #[inline] (which is a hint, not a directive) is sufficient.

No `unwrap()` in Library Code

Library code (src/lib.rs and all modules it includes) must not use unwrap() or expect(). Use the ? operator to propagate errors, or return Result:

#![allow(unused)]
fn main() {
// Good
pub fn get_intent(&self, id: IntentId) -> Result<&Intent, TalaError> {
    self.index.get(&id).ok_or(TalaError::NotFound(id))
}

// Bad -- panics on missing intent
pub fn get_intent(&self, id: IntentId) -> &Intent {
    self.index.get(&id).unwrap()
}
}

unwrap() and expect() are acceptable in:

Benchmark code (benches/*.rs)
Test code (#[cfg(test)] modules, tests/*.rs)
Static initialization where failure means the program cannot run (e.g., regex compilation of a constant pattern)

Error Handling

All errors flow through the TalaError enum defined in tala-core. Crate-specific error types use #[from] with thiserror for automatic conversion:

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
pub enum TalaError {
    #[error("intent not found: {0}")]
    NotFound(IntentId),

    #[error("WAL write failed: {0}")]
    WalError(#[from] std::io::Error),

    #[error("segment corrupt: {0}")]
    SegmentCorrupt(String),
    // ...
}
}

Do not define standalone error enums in individual crates. Add variants to TalaError or use #[from] to convert crate-local errors into TalaError.

Trait Definitions

Trait definitions live in tala-core under src/traits/. Implementations live in the crate that owns the concern:

tala-core/src/traits/store.rs    -> defines IntentStore trait
tala-store/src/lib.rs            -> implements IntentStore for StorageEngine

Public APIs between sibling crates use trait bounds, never concrete types from the sibling:

#![allow(unused)]
fn main() {
// Good: depends on trait from tala-core
pub fn query<S: IntentStore>(store: &S, embedding: &[f32]) -> Vec<IntentId> { ... }

// Bad: depends on concrete type from tala-store
pub fn query(store: &tala_store::StorageEngine, embedding: &[f32]) -> Vec<IntentId> { ... }
}

Benchmarks

Benchmark files live in crates/<name>/benches/, not in tests/. Use Criterion with harness = false:

[[bench]]
name = "embed_bench"
harness = false

Benchmark groups should be named descriptively (cosine_similarity, hnsw_search, wal_append) and use BenchmarkId for parameterized variants:

#![allow(unused)]
fn main() {
group.bench_with_input(BenchmarkId::new("avx2", dim), &dim, |b, _| {
    b.iter(|| unsafe { avx2::cosine_similarity(&a, &b) })
});
}

Feature Flags

Optional capabilities use feature flags:

Feature	Purpose
`cuda`	CUDA GPU acceleration for batch similarity
`vulkan`	Vulkan compute shader acceleration
`cluster`	Distributed mode (Raft, SWIM, QUIC)
`encryption`	AES-256-GCM encryption for TBF segments

Default features cover the common single-node case. GPU and cluster features are opt-in:

[features]
default = []
cuda = ["dep:cuda-runtime-sys"]
cluster = ["dep:tala-net"]

Naming Conventions

Type names: PascalCase (IntentId, BloomFilter, CsrIndex)
Field names: snake_case (node_count, embedding_dim, created_at)
Function names: snake_case (cosine_similarity, flush_segment)
Constants: SCREAMING_SNAKE_CASE (BUCKET_BOUNDS, TBF_MAGIC)
Module names: snake_case (src/traits/store.rs, src/scalar.rs)

TALA — Intent-Native Narrative Execution Layer