The Live System
Cognition Engine

Reimagines tail from "show me lines" to "what's happening, what matters, and why?"

47,000 log lines → 92 groups → 38 templates → 2 root causes → 1 diagnosis

zig build -Doptimize=ReleaseSafe && cp ./zig-out/bin/tailx ~/.local/bin/ Clone Repo

Requires Zig 0.14.0 • 144 KB Binary • Zero Dependencies • MIT License

tailx -s -n app.log api.log db.log worker.log
{"level":"error","service":"payments","msg":"connection timeout to stripe","latency_ms":5200}
{"level":"info","service":"api","msg":"GET /checkout 200","duration_ms":42}
{"level":"error","service":"db","msg":"pool exhausted, 0 connections available"}
{"level":"error","service":"payments","msg":"connection timeout to stripe","latency_ms":4800}
{"level":"warn","service":"worker","msg":"retry queue depth at 847"}
{"level":"error","service":"payments","msg":"connection timeout to stripe","latency_ms":6100}
{"level":"info","service":"api","msg":"GET /health 200","duration_ms":2}
{"level":"fatal","service":"payments","msg":"circuit breaker opened for stripe"}
... 46,992 more lines ...
Pattern Summary • 47,283 events • 92 groups • 38 templates • 15,252 ev/s
DB pool exhausted ×8,241
connection pool exhausted, 0 connections available
Payment timeout ×6,102
connection timeout to stripe after 5200ms
Worker retry queue ×2,847
retry queue depth exceeding threshold
Circuit breaker ×312
circuit breaker opened for stripe
Root cause: DB pool exhaustion → payment timeouts → circuit breaker → retry storm
144 KB
Stripped Binary
69K
Events / Second
0
Dependencies
219
Tests Passing
<1 ms
Cold Start

The Pipeline

input

1. Ingest & Parse

Auto-detects JSON, logfmt, syslog, and unstructured formats. Extracts severity, service, trace IDs, and structured fields. Zero config. Per-source format locking after 8 lines.

group_work

2. Group & Rank

Drain algorithm fingerprints messages into structural templates. 47,000 lines collapse into 38 templates. Groups ranked by severity × frequency × trend. Rising, stable, falling, gone.

troubleshoot

3. Detect & Correlate

EWMA + z-score rate spike detection. CUSUM change-point detector. Temporal proximity correlation — "DB latency spiked 2s before error rate. Likely related (82% confidence)."

tailx --json -s -n app.log db.log | tail -1
{
  "type": "triage_summary",
  "stats": {
    "events": 47283,
    "groups": 92,
    "templates": 38,
    "events_per_sec": 15252.0
  },
  "top_groups": [
    {
      "exemplar": "connection pool exhausted",
      "count": 8241,
      "severity": "ERROR",
      "trend": "rising",
      "service": "db"
    },
    {
      "exemplar": "connection timeout to stripe",
      "count": 6102,
      "severity": "ERROR",
      "trend": "rising",
      "service": "payments"
    }
  ],
  "anomalies": [
    { "kind": "rate_spike", "score": 0.87,
      "observed": 412.0, "expected": 85.0 }
  ],
  "hypotheses": [
    { "causes": [
        { "label": "DB pool exhaustion",
          "strength": 0.91, "lag_ms": 1500 }
      ], "confidence": 0.91 }
  ],
  "traces": [...]
}
AI-NATIVE INTERFACE

Built for AI Agents

--json emits structured JSONL. The last line is always a triage_summary — one object containing every insight the engine computed: groups, anomalies, correlations, traces.

  • check_circle Structured triage replaces raw log parsing
  • check_circle One subprocess call for complete system state
  • check_circle MCP tool / agent integration ready
  • check_circle Intent queries: "errors related to payments"

Five Modes

Every mode uses the same engine. Different lenses on the same intelligence.

--raw

Classic tail. Line by line. No grouping.

(default)

Pattern mode. Lines + ranked group summary at end.

--trace

Trace view. Events grouped by trace_id as request flow trees.

--incident

Only anomalies + top rising groups. Suppress normal events.

--json

JSONL for AI agents. Triage summary as last line.

Under the Hood

Statistical-first. No LLM in the hot path. Math is fast, deterministic, and explainable.

Drain Templates

Fixed-depth tree parser collapses "Connection to 10.0.0.1 timed out" and "Connection to 10.0.0.2 timed out" into one template with parameter tracking.

EWMA + CUSUM

Dual-rate exponential moving averages for rate baselines. Cumulative sum detector for sustained shifts that z-score alone would miss. 3σ minimum threshold.

Probabilistic Sketches

HyperLogLog for cardinality estimation. Count-Min Sketch for frequency. T-Digest for streaming percentiles. All O(1) memory, O(1) per event.

Multi-Format Parser

JSON, logfmt/key=value, syslog BSD, and unstructured text. Auto-detected per source. Extracts severity, service, trace_id, structured fields.

Trace Reconstruction

Events with matching trace_id are grouped into request flow trees with duration and outcome detection (success, failure, timeout).

Arena Allocation

Generation-tagged arena allocators. Bulk free on window expiry. Zero per-event heap allocation in steady state. Zig stdlib only.

See the Signal.

47,000 lines of noise. 2 root causes. One command.

tailx • 8,347 lines of Zig • 219 tests • MIT License