palimpsest-embed
Embedding generation, SQLite vector search, and LCS-based change detection.
Embedding
#![allow(unused)]
fn main() {
pub struct Embedding { pub values: Vec<f32> }
impl Embedding {
pub fn dimension(&self) -> usize;
pub fn cosine_similarity(&self, other: &Embedding) -> f32;
}
}
EmbeddingProvider Trait
#![allow(unused)]
fn main() {
pub trait EmbeddingProvider: Send + Sync {
async fn embed(&self, text: &str) -> Result<Embedding, PalimpsestError>;
async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Embedding>, PalimpsestError>;
fn dimension(&self) -> usize;
fn name(&self) -> &str;
}
}
HashEmbedder
Deterministic test embedder using BLAKE3:
#![allow(unused)]
fn main() {
impl HashEmbedder {
pub fn new(dimension: usize) -> Self;
}
}
Generates pseudo-embeddings by hashing the input text with BLAKE3 and mapping hash bytes to f32 values. Deterministic — same text = same embedding. Not semantically meaningful, but sufficient for testing the vector store pipeline.
VectorStore
SQLite-backed embedding storage with brute-force cosine similarity search:
#![allow(unused)]
fn main() {
impl VectorStore {
pub fn open(path: &Path) -> Result<Self, VectorStoreError>;
pub fn in_memory() -> Result<Self, VectorStoreError>;
pub fn insert(&self, chunk_hash: &str, source_url: &str, captured_at: &str,
text: &str, embedding: &Embedding, provider: &str) -> Result<bool, VectorStoreError>;
pub fn search(&self, query_embedding: &Embedding, limit: usize)
-> Result<Vec<StoredEmbedding>, VectorStoreError>;
}
}
StoredEmbedding
#![allow(unused)]
fn main() {
pub struct StoredEmbedding {
pub chunk_hash: String,
pub source_url: String,
pub captured_at: String,
pub text: String,
pub similarity: f32,
}
}
Change Detection
LCS-based (Longest Common Subsequence) line-level diff:
#![allow(unused)]
fn main() {
pub struct ContentDiff {
pub hunks: Vec<DiffHunk>,
pub similarity: f32, // 0.0 to 1.0
pub added: usize,
pub removed: usize,
pub unchanged: usize,
}
pub enum DiffHunk {
Added(String),
Removed(String),
Unchanged(String),
}
}
Compares two captures of the same URL to identify what changed between them.