Retrieval API

The retrieval API serves captured content over HTTP for AI pipelines, RAG systems, and content auditing.

Start the Server

palimpsest api --port 8080 --data-dir ./output

Endpoints

GET /v1/content

Retrieve raw captured content for a URL.

curl "http://localhost:8080/v1/content?url=https://example.com/"

Returns the stored HTTP response body.

GET /v1/chunks

Retrieve RAG-ready chunks with full provenance.

curl "http://localhost:8080/v1/chunks?url=https://example.com/"

Response:

{
  "url": "https://example.com/",
  "chunks": [
    {
      "text": "Example Domain. This domain is for use in illustrative examples...",
      "chunk_index": 0,
      "total_chunks": 3,
      "char_offset": 0,
      "chunk_hash": "blake3:af13...",
      "source_hash": "blake3:c7d2...",
      "captured_at": "2026-04-12T10:30:00Z"
    }
  ]
}

GET /v1/history

All captures of a URL with timestamps and content hashes.

curl "http://localhost:8080/v1/history?url=https://example.com/"

Response:

{
  "url": "https://example.com/",
  "captures": [
    {"captured_at": "2026-04-12T10:30:00Z", "content_hash": "blake3:af13...", "crawl_context": 1},
    {"captured_at": "2026-04-13T08:00:00Z", "content_hash": "blake3:b8e2...", "crawl_context": 2}
  ]
}

GET /v1/search

Search across captured content.

curl "http://localhost:8080/v1/search?q=example+domain"

GET /metrics

Prometheus-compatible metrics (see Monitoring).

GET /health

curl http://localhost:8080/health
# "ok"

Use Cases

RAG pipelines — /v1/chunks provides pre-chunked text with provenance for embedding
Content auditing — /v1/history shows exactly when content changed
AI training — /v1/content serves raw captured pages
Search systems — /v1/search provides full-text search across the archive

Keyboard shortcuts

Palimpsest Documentation