Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

palimpsest-embed

Embedding generation, SQLite vector search, and LCS-based change detection.

Embedding

#![allow(unused)]
fn main() {
pub struct Embedding { pub values: Vec<f32> }

impl Embedding {
    pub fn dimension(&self) -> usize;
    pub fn cosine_similarity(&self, other: &Embedding) -> f32;
}
}

EmbeddingProvider Trait

#![allow(unused)]
fn main() {
pub trait EmbeddingProvider: Send + Sync {
    async fn embed(&self, text: &str) -> Result<Embedding, PalimpsestError>;
    async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Embedding>, PalimpsestError>;
    fn dimension(&self) -> usize;
    fn name(&self) -> &str;
}
}

HashEmbedder

Deterministic test embedder using BLAKE3:

#![allow(unused)]
fn main() {
impl HashEmbedder {
    pub fn new(dimension: usize) -> Self;
}
}

Generates pseudo-embeddings by hashing the input text with BLAKE3 and mapping hash bytes to f32 values. Deterministic — same text = same embedding. Not semantically meaningful, but sufficient for testing the vector store pipeline.

VectorStore

SQLite-backed embedding storage with brute-force cosine similarity search:

#![allow(unused)]
fn main() {
impl VectorStore {
    pub fn open(path: &Path) -> Result<Self, VectorStoreError>;
    pub fn in_memory() -> Result<Self, VectorStoreError>;
    pub fn insert(&self, chunk_hash: &str, source_url: &str, captured_at: &str,
                  text: &str, embedding: &Embedding, provider: &str) -> Result<bool, VectorStoreError>;
    pub fn search(&self, query_embedding: &Embedding, limit: usize)
                  -> Result<Vec<StoredEmbedding>, VectorStoreError>;
}
}

StoredEmbedding

#![allow(unused)]
fn main() {
pub struct StoredEmbedding {
    pub chunk_hash: String,
    pub source_url: String,
    pub captured_at: String,
    pub text: String,
    pub similarity: f32,
}
}

Change Detection

LCS-based (Longest Common Subsequence) line-level diff:

#![allow(unused)]
fn main() {
pub struct ContentDiff {
    pub hunks: Vec<DiffHunk>,
    pub similarity: f32,      // 0.0 to 1.0
    pub added: usize,
    pub removed: usize,
    pub unchanged: usize,
}

pub enum DiffHunk {
    Added(String),
    Removed(String),
    Unchanged(String),
}
}

Compares two captures of the same URL to identify what changed between them.