MemTable

The MemTable is the in-memory write buffer. Every write is applied here immediately after the WAL fsync. Reads check the MemTable first, then fall back to segment files.

Structure

#![allow(unused)]
fn main() {
pub struct MemTable {
    // Primary storage
    entities:      BTreeMap<EntityId, Entity>,
    relationships: BTreeMap<RelationshipId, Relationship>,

    // Secondary indices (maintained in sync with primary)
    type_index:    HashMap<EntityType, Vec<EntityId>>,
    class_index:   HashMap<EntityClass, Vec<EntityId>>,
    source_index:  HashMap<CompactString, Vec<EntityId>>,   // connector_id → entities
    adjacency:     HashMap<EntityId, (Vec<RelationshipId>,  // outgoing
                                      Vec<RelationshipId>)>, // incoming
}
}

Operations

Upsert

#![allow(unused)]
fn main() {
pub fn upsert_entity(&mut self, entity: Entity) {
    let id = entity.id;
    let entity_type = entity._type.clone();
    let entity_class = entity._class.clone();
    let connector_id = entity.source.connector_id.clone();

    // Update primary store
    self.entities.insert(id, entity);

    // Update all secondary indices
    self.type_index.entry(entity_type).or_default().push(id);
    self.class_index.entry(entity_class).or_default().push(id);
    self.source_index.entry(connector_id).or_default().push(id);
    // adjacency is updated by upsert_relationship
}
}

Tombstone (Soft Delete)

When the sync protocol determines that an entity was removed from a source:

#![allow(unused)]
fn main() {
pub fn delete_entity(&mut self, id: EntityId) {
    if let Some(entity) = self.entities.get_mut(&id) {
        entity._deleted = true;  // soft delete — remains for snapshot visibility
    }
}
}

Soft-deleted entities are invisible to queries (INV-S08) but remain in memory until the next compaction cycle removes them.

Adjacency Index

The adjacency index enables O(1) neighbor lookups:

#![allow(unused)]
fn main() {
pub fn upsert_relationship(&mut self, rel: Relationship) {
    let rel_id = rel.id;
    let from_id = rel.from_id;
    let to_id = rel.to_id;

    self.relationships.insert(rel_id, rel);

    // Both endpoints track this relationship
    self.adjacency.entry(from_id).or_default().0.push(rel_id); // outgoing
    self.adjacency.entry(to_id).or_default().1.push(rel_id);   // incoming
}
}

This is the index that makes graph traversal fast. Without it, every hop would require a full scan of all relationships.

Flush to Segment

When memtable.approx_bytes() > config.memtable_flush_size (default: 64MB), StorageEngine::maybe_flush() runs:

#![allow(unused)]
fn main() {
fn maybe_flush(&mut self) -> Result<(), StoreError> {
    if self.memtable.approx_bytes() <= self.config.memtable_flush_size {
        return Ok(());
    }

    // Write current MemTable contents to a new .pxs segment
    let segment_path = self.next_segment_path();
    Segment::write(&segment_path, &self.memtable)?;

    // Drain the MemTable: clears entity/rel data, preserves adjacency index
    let new_segment = SegmentRef::open(segment_path)?;
    let drained = self.memtable.drain_to_flush();
    self.segments.push(new_segment);

    // Publish new snapshot pointing to empty MemTable + new segment
    self.publish_snapshot();
    Ok(())
}
}

The drain_to_flush() operation is carefully designed:

Entity and relationship data moves to the segment file
The adjacency index is preserved (rebuilt from segments during recovery)
Secondary indices are cleared (rebuilt from segment scans as needed)

Memory Accounting

#![allow(unused)]
fn main() {
pub fn approx_bytes(&self) -> usize {
    // Rough estimate: sum of entity and relationship sizes
    self.entities.values().map(|e| std::mem::size_of_val(e)).sum::<usize>()
        + self.relationships.values().map(|r| std::mem::size_of_val(r)).sum::<usize>()
}
}

This is an approximation — it counts stack sizes of the structs but not heap-allocated strings. For memory budgeting, assume 2-4× the struct size per entity due to CompactString heap allocations for long strings.

Query Methods

The MemTable exposes index-accelerated query methods used by Snapshot:

#![allow(unused)]
fn main() {
// O(1) lookup
pub fn get_entity(&self, id: EntityId) -> Option<&Entity>;
pub fn get_relationship(&self, id: RelationshipId) -> Option<&Relationship>;

// Index-accelerated scans
pub fn entities_by_type(&self, t: &EntityType) -> Vec<&Entity>;
pub fn entities_by_class(&self, c: &EntityClass) -> Vec<&Entity>;
pub fn entities_by_source(&self, connector_id: &str) -> Vec<&Entity>;
pub fn all_entities(&self) -> impl Iterator<Item = &Entity>;

// Adjacency (O(1) for the lookup, O(degree) for iteration)
pub fn outgoing_relationships(&self, id: EntityId) -> Vec<&Relationship>;
pub fn incoming_relationships(&self, id: EntityId) -> Vec<&Relationship>;
}

Parallax