Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

System Overview

Palimpsest is a crawl kernel, not a crawler. The distinction matters: a crawler is a tool that fetches web pages. A crawl kernel is the deterministic execution engine that schedules fetches, seals execution contexts, captures artifacts, stores content-addressed blobs, indexes temporal state, and enables bit-identical replay.

The CLI, server, and UI are thin wrappers. The kernel is the product.

Layer Model

The system is organized into five layers, each with strict responsibilities:

┌─────────────────────────────────────────────────┐
│  Interface Layer                                 │
│  palimpsest-cli · palimpsest-server              │
├─────────────────────────────────────────────────┤
│  Orchestration Layer                             │
│  palimpsest-crawl · palimpsest-sim               │
├─────────────────────────────────────────────────┤
│  Capture Layer                                   │
│  palimpsest-fetch · palimpsest-artifact          │
│  palimpsest-extract · palimpsest-embed           │
├─────────────────────────────────────────────────┤
│  Persistence Layer                               │
│  palimpsest-storage · palimpsest-index           │
│  palimpsest-replay · palimpsest-shadow           │
├─────────────────────────────────────────────────┤
│  Foundation Layer                                │
│  palimpsest-core · palimpsest-envelope           │
│  palimpsest-frontier                             │
└─────────────────────────────────────────────────┘

Design Principles

Zero shared mutable state. The core kernel has no global state. All state flows through explicit parameters — seeds, envelopes, configs.

The ExecutionEnvelope is the critical abstraction. It seals the execution context (seed, timestamp, DNS snapshot, TLS fingerprint, browser config, headers) before any fetch occurs. Without the envelope, you cannot replay, verify, or prove anything.

Errors are artifacts. Every failure is classified into one of seven categories and stored as part of the crawl record. Errors are not noise — they are history.

Content is addressed, not located. Every blob is stored and retrieved by its BLAKE3 hash. Deduplication is structural, not post-process.