Retrieval API
The retrieval API serves captured content over HTTP for AI pipelines, RAG systems, and content auditing.
Start the Server
palimpsest api --port 8080 --data-dir ./output
Endpoints
GET /v1/content
Retrieve raw captured content for a URL.
curl "http://localhost:8080/v1/content?url=https://example.com/"
Returns the stored HTTP response body.
GET /v1/chunks
Retrieve RAG-ready chunks with full provenance.
curl "http://localhost:8080/v1/chunks?url=https://example.com/"
Response:
{
"url": "https://example.com/",
"chunks": [
{
"text": "Example Domain. This domain is for use in illustrative examples...",
"chunk_index": 0,
"total_chunks": 3,
"char_offset": 0,
"chunk_hash": "blake3:af13...",
"source_hash": "blake3:c7d2...",
"captured_at": "2026-04-12T10:30:00Z"
}
]
}
GET /v1/history
All captures of a URL with timestamps and content hashes.
curl "http://localhost:8080/v1/history?url=https://example.com/"
Response:
{
"url": "https://example.com/",
"captures": [
{"captured_at": "2026-04-12T10:30:00Z", "content_hash": "blake3:af13...", "crawl_context": 1},
{"captured_at": "2026-04-13T08:00:00Z", "content_hash": "blake3:b8e2...", "crawl_context": 2}
]
}
GET /v1/search
Search across captured content.
curl "http://localhost:8080/v1/search?q=example+domain"
GET /metrics
Prometheus-compatible metrics (see Monitoring).
GET /health
curl http://localhost:8080/health
# "ok"
Use Cases
- RAG pipelines —
/v1/chunksprovides pre-chunked text with provenance for embedding - Content auditing —
/v1/historyshows exactly when content changed - AI training —
/v1/contentserves raw captured pages - Search systems —
/v1/searchprovides full-text search across the archive