Introduction

Palimpsest is a deterministic crawl kernel — not a crawler, not a Wayback clone, not a scraping framework. It is the foundational memory layer of the web: a system where the same input and the same seed produce an identical crawl, identical artifacts, and identical replay. Every design decision bends around this property.

What Makes This Different

Traditional web archiving tools (Heritrix, wget, Scrapy, Brozzler) treat crawling as an inherently non-deterministic process. Network jitter, DNS resolution timing, thread scheduling, and random retry backoff all introduce entropy. Two runs of the same crawl produce different results. This makes verification impossible, replay approximate, and auditing meaningless.

Palimpsest eliminates this. The system is governed by Six Laws — determinism, idempotence, content addressability, temporal integrity, replay fidelity, and observability as proof — that are enforced at every layer, from the frontier scheduler to the artifact serializer.

The result: a crawl kernel that auditors can trust, AI systems can consume, historians can depend on, and adversaries cannot easily corrupt.

The System at a Glance

Metric	Value
Crates	15 Rust workspace members
Tests	301 (zero failures)
Determinism proof	10,000 pages, zero divergence
Storage	Content-addressed (BLAKE3) with structural deduplication
Format	WARC++ (ISO 28500 extension)
Index	Temporal graph: URL x time x hash x context
Capture	Raw HTTP + headless Chrome (CDP)
Distribution	HTTP frontier server + N workers

How to Read This Documentation

Getting Started — Install, run your first crawl, configure the system.
Architecture — System design, the Six Laws, crate dependency graph, data flow.
Core Concepts — Deep dives into determinism, content addressability, the execution envelope, temporal indexing, and the WARC++ format.
Crate Reference — Complete API documentation for all 15 crates.
Operations — Docker deployment, distributed crawling, retrieval API, monitoring.
Security — Trust boundaries, fetch safety, browser sandboxing.
Testing — Testing philosophy, the simulation framework, adversarial universes.
Contributing — Development setup, code standards, commit conventions.
Appendix — Error taxonomy, API quick reference, glossary.

Keyboard shortcuts

Palimpsest Documentation

Introduction

What Makes This Different

The System at a Glance

How to Read This Documentation