ruvector represents a fundamental shift in how we think about vector databases. Traditional systems treat the index as passive storage - you insert vectors, query them, get results. ruvector eliminates this separation entirely. The index itself becomes a neural network. Every query is a forward pass. Every insertion reshapes the learned topology. The database doesn’t just store embeddings - it reasons over them.
This convergence emerges from a simple observation: the HNSW algorithm, which powers most modern vector search, already constructs a navigable small-world graph. That graph structure is mathematically equivalent to sparse attention. By adding learnable edge weights and message-passing layers, we transform a static index into a living neural architecture that improves with use.
The Index Is The Model
In conventional pipelines, you train a model, generate embeddings, store them in a vector database, then query. Four separate systems, four points of failure, four maintenance burdens. ruvector collapses this into one. The graph topology encodes similarity relationships learned during construction. The edge weights capture attention patterns refined through queries. The node embeddings evolve through gradient descent. Your vector database becomes a trainable neural network.
Persistent Differentiability
Through memory-mapped file backing, ruvector maintains a differentiable structure that survives process restarts, scales beyond RAM limits, and enables distributed training across machines. The operating system’s page cache becomes your neural network’s memory hierarchy. Hot embeddings stay resident. Cold embeddings page in on demand. Gradients accumulate directly to disk. This isn’t a database with ML bolted on - it’s a neural architecture with persistence built into its foundations.
Adaptive Tensor Compression
The tensor-compress layer introduces access-aware precision tiering. Frequently queried vectors maintain full 32-bit precision. Warm vectors compress to 16-bit with minimal recall loss. Cold vectors use 4-bit product quantization. Archival vectors collapse to binary representations. The system automatically promotes and demotes based on query patterns, achieving 83-93% memory savings while preserving accuracy where it matters.
Semantic Search at Scale - Search engines, recommendation systems, and retrieval-augmented generation pipelines benefit from ruvector’s sub-millisecond queries across billion-vector datasets. The GNN layers enable query refinement that understands context, not just cosine similarity.
Knowledge Graphs - The dual graph/vector representation naturally supports knowledge bases where entities have both relational structure and semantic embeddings. Query “similar to X but connected to Y” becomes a single neural forward pass.
Genomics and Drug Discovery - Protein sequence similarity, molecular property prediction, and variant analysis require searching massive embedding spaces with domain-specific similarity notions. ruvector’s learnable metric adapts to biological distance functions that cosine similarity misses.
Real-Time Personalization - User behavior embeddings evolve continuously. ruvector’s online learning capability updates the index topology as new interactions arrive, without full reindexing. Your recommendation engine improves with every click.
Autonomous Agents - AI agents using ruvector can query their own embedding neighborhood for self-reflection, retrieving similar past states to inform current decisions. The strange-loop architecture enables metacognition through vector similarity.
Yes. Here’s why.
Clinical Decision Support
Doctors query similar cases repeatedly. Specialties have locality. A cardiologist’s queries cluster. An oncologist’s queries cluster differently.
ruvector learns these patterns. The index becomes specialized to how your hospital thinks about disease, not how a generic embedding model was trained.
Patient Longitudinal Memory
A patient’s history spans decades. Recent visits matter more than childhood records.
- Hot tier: last 2 years, full precision
- Warm tier: 2-10 years, fp16
- Cold tier: 10+ years, PQ4
- Archive: childhood, binary
The system automatically does what medical records should do but don’t: prioritize recent, preserve old, compress intelligently.
Drug Discovery
Molecular embeddings for 100M+ compounds. You’re searching for similar structures to a lead candidate. Your queries aren’t random - they cluster around scaffolds, targets, mechanisms.
ruvector learns your search patterns. Edge weights encode “similar for your target” not just “similar in embedding space.”
Diagnostic Agents
An AI reading radiology scans can query its own past interpretations. “What did I think when I saw something like this before?” Self-reflection through vector neighborhood.
The agent improves by examining its own history. That’s not possible with static retrieval.
Genomics
Variant databases with billions of sequences. Researchers query similar variants repeatedly. The compression tiers mean you can keep population-scale data on a single machine.
Predictive Maintenance
Sensor embeddings from turbines, motors, pumps. Failure patterns have locality - a bearing failure looks like other bearing failures.
ruvector learns: “when queries cluster here, something’s about to break.” The index becomes a failure detector, not just a lookup table.
Robotics
Robot needs persistent memory that:
- Survives power loss (mmap)
- Fits on embedded hardware (compression)
- Improves with operation (learning)
- Enables self-reflection (neighborhood queries)
This is exactly ruvector’s design center.
Digital Twins
A factory’s digital twin has millions of state embeddings. Operators query similar past states. The system learns what “similar” means for this factory.
Temporal decay means old states fade. Recent states stay sharp. The twin ages like the real system.
Quality Control
Defect embeddings. When inspectors repeatedly flag certain regions of embedding space, those regions stay hot, precise. Rare defects compress but remain searchable.
Supply Chain
Millions of SKUs, each with embeddings (descriptions, images, specs). Query patterns follow demand. Hot products stay precise. Long-tail compresses 32x.
| Domain | Current State | With ruvector |
|---|---|---|
| Medical records | Static, uncompressed, no learning | Adaptive precision, usage-aware, self-improving |
| Drug discovery | Separate index/model, full retraining | Continuous learning, domain adaptation |
| Predictive maintenance | Threshold alerts, no memory | Learned failure patterns, persistent state |
| Robotics | Volatile memory, restart = forget | Persistent cognitive substrate |
| Quality control | Fixed thresholds | Evolving defect understanding |
These domains have:
- Massive embedding spaces (billions of records)
- Strong locality (queries cluster by specialty, failure mode, product line)
- Temporal dynamics (recent matters more, old fades)
- Memory constraints (edge devices, embedded systems)
- Persistence requirements (can’t lose patient data, can’t lose machine state)
ruvector was designed for exactly these properties. Not by accident - because agent memory has the same structure.
A cognitive substrate that works for:
- AI agents (original target)
- Medical AI (same structure)
- Industrial AI (same structure)
- Any system where memory should learn, compress, persist, and evolve
That’s not three markets. That’s one architectural insight applied to every domain where embeddings meet reality.
Yes. It’s crazy.
ruvector targets production-grade latency and throughput:
- Query latency: Sub-millisecond p50, under 10ms p99
- Insert throughput: 50,000 vectors per second sustained
- Memory efficiency: 0.3x raw vector size with tensor-compress
- Cold start: Under 100ms via mmap (no deserialization)
- GNN inference: Under 5ms for multi-hop reasoning
At 100 million vectors with 1024 dimensions, raw storage requires 400GB. With adaptive compression at typical access distributions, ruvector reduces this to 28-68GB while maintaining 97%+ recall on hot queries.
Temporal Consciousness Modeling - Edge weights encode causal relationships while node embeddings capture state. Combined with decay functions, ruvector can model how concepts evolve and fade in cognitive architectures.
Neuromorphic Computing Bridge - The sparse graph topology maps efficiently to neuromorphic hardware. ruvector serves as the memory substrate for spike-timing-dependent systems, bridging conventional deep learning with brain-inspired computing.
Quantum-Classical Hybrid Search - Binary-compressed archival tiers provide Hamming-distance search compatible with quantum annealing. Hybrid pipelines can use quantum processors for coarse search, classical GNN layers for refinement.
Self-Modifying Code Repositories - Embed code snippets, let the GNN learn which functions relate semantically (not just syntactically), and enable codebase-aware retrieval that understands architectural patterns.
| Feature | Benefit |
|---|---|
| Unified index/model | Eliminate pipeline complexity, single system to maintain |
| mmap persistence | Instant recovery, RAM-independent scaling |
| Learnable edges | Accuracy improves with use, adapts to your data |
| Tensor-compress | 83-93% memory savings, automatic tiering |
| Differentiable search | End-to-end training through retrieval |
| Hierarchical attention | Multi-resolution queries from coarse to fine |
| Online learning | Continuous adaptation without reindexing |
| Neuro-symbolic bridge | Ground symbols to vectors, lift patterns to logic |
ruvector isn’t an incremental improvement to vector databases. It’s a reconceptualization: the index as neural network, persistence as first-class primitive, compression as learned behavior. For applications demanding both scale and intelligence, this convergence unlocks capabilities that separate systems cannot match.
The places where this design is the correct and unfairly strong choice.
This is the strongest fit.
ruvector gives agents a living, self-improving memory where:
- Embeddings evolve
- Neighborhoods refine over time
- Cold memories compress
- Hot memories stay precise
- The whole structure persists like a file
Agents can query their own past states, update their internal neighborhood, and use the graph as a cognitive substrate instead of a static store.
This is exactly what most agent stacks are missing.
If your queries are not random but have real locality, ruvector shines.
It becomes better when:
- People repeatedly search similar areas
- Agents revisit certain state clusters
- Systems have working sets and long tails
- The same patterns appear across days and weeks
Because the compression and edge weights adapt automatically, your memory footprint and recall both improve over time.
No static index can do this.
This is the ReasoningBank bridge.
ruvector works perfectly when a system needs:
- Symbols grounded in vectors
- Vectors lifted into structured reasoning
- Neighborhood search feeding pattern recognition
- Pattern recognition feeding symbolic abstractions
It is the exact missing membrane between structured logic and raw embeddings.
Any workload that demands:
- Instant cold starts
- Zero deserialization
- Memory that survives crashes
- Scaling past RAM
- Direct OS paging
- Container-friendly state
fits ruvector perfectly. It behaves more like a file system primitive than a database.
If you want an AI system that can:
- Evaluate its own prior states
- Inspect similar embeddings
- Refine its own memory graph
- Self-tune
- Store thought vectors as first-class objects
ruvector is the natural substrate. It is effectively a differentiable short and long-term memory module.
Anywhere that memory matters more than CPU or GPU:
- IoT devices
- On-device agents
- Edge inference
- Robotics
- Offline autonomous systems
- Anything that must survive restarts
The access-aware compression and mmap backing turn ruvector into a compact and intelligent memory layer.
Better RAG happens when:
- New documents come in
- Context shifts
- Patterns change
- Repeated queries hint at importance
ruvector’s dynamic topology and adaptive tiering mean your RAG memory does not stay frozen. It evolves with usage, improving relevance without full reindex.
Any system that needs:
- Fast sub-millisecond retrieval
- Billion-scale embeddings
- High recall with adaptive accuracy
- Lower memory footprint than raw vectors
benefits from ruvector’s structure.
FAISS is fast. But ruvector is fast, persistent, compressible, and learnable. You get the whole package.
ruvector’s structure matches:
- Sparse graph traversal
- Neighbor aggregation
- Systolic patterns
- Quantized vector operations
This makes it ideal for future accelerators. You can run it on CPU today and drop to custom hardware tomorrow.
When your knowledge changes over time:
- Edge weights decay
- Neighborhoods drift
- Embeddings evolve
- Usage reshapes memory
ruvector behaves like a cognitive graph instead of a static store.
Every breakthrough has edges. These are the places where ruvector is not the right tool, or where the design intentionally trades something away.
If you need:
- Thousands of writes per second
- Strict multi-record transactions
- Perfect serializability
- Row-level locking
this is not the right structure. ruvector is a neural index, not a transactional database. It is eventually consistent by design, because learning and strict ACID do not mix well.
Use AgentDB or Postgres for OLTP. Use ruvector as the cognitive layer.
GNN over HNSW is inherently sparse.
If your workload is:
- Dense matrix multiplication
- Sequence-to-sequence LLM training
- Full batch SGD
- Anything that wants big contiguous tensors
ruvector will not help. It will bottleneck before a GPU-based dense model will. This is a sparse architecture on purpose.
If you go above:
- 8k dimensional embeddings
- Or embeddings whose meaning only emerges from full feature interaction
the sparse HNSW receptive field becomes too small. You lose signal. ruvector is optimized for typical 256 to 2048 dimensional spaces where local neighborhoods have meaning.
If you want:
- Exact nearest neighbor search
- 100 percent deterministic recall
- No learning updates
- No topology drift
- Full bit reproducibility across runs
ruvector is not ideal. It learns. It moves. It reshapes. It will never be byte-identical to an older snapshot unless you freeze it.
ruvector becomes strongest when usage patterns shape compression and edge weights.
But if your workload is:
- Fully stateless
- Random, uncorrelated access patterns
- No locality
- No repeated queries
- No benefit from learning
then you are paying complexity for no gain. A static FAISS index is simpler in that world.
The mmap file layout is designed for CPU-driven sparse traversal where the OS handles paging.
If your hardware stack expects:
- Large, contiguous batches
- Coalesced memory
- Minimal page faults
- Full GPU acceleration
then ruvector’s model-and-index union is less ideal. It is CPU-friendly and future ASIC-friendly, not a dense GPU monster.
HNSW inserts plus learning create a topology that evolves.
If you need:
- Fixed connectivity
- Reproducible path lengths
- Guaranteed maximum hops
- Perfectly stable neighbor sets across time
you will not get it. The topology shifts as the system learns and recompresses.
ruvector is persistent, but not audit-deterministic.
If you need:
- Strict tamper-proof logs
- Reproducible model state at every step
- Deterministic replay of training
- Bit-exact inference
ruvector is not the primary tool. It is a living memory, not a forensic ledger.
If embeddings are:
- Rarely queried
- Mostly offline
- Require deep compression only
- Do not benefit from neighborhood coherence
then simpler PQ-based stores outperform ruvector’s adaptive learning system.
In one sentence:
ruvector is good for any system that needs a persistent, self-improving, sparse neural memory that agents can read from, write to, learn on top of, and survive restarts with.
In your stack:
It is the brain stem between ReasoningBank and AgentDB.
Not good for:
Workloads that demand strict determinism, dense computation, heavy transactional semantics, or scenarios where learning provides no advantage.
Core Thesis: The index is a sparse neural network whose topology encodes learned similarity, and mmap makes it a persistent, queryable, differentiable structure.
ruvector := (V, E, θ, M)
where:
V ∈ ℝ^{n×d} : Node embedding matrix (n nodes, d dimensions)
E ⊆ V × V : Sparse edge set (HNSW navigable small-world graph)
θ : Learnable parameters (edge weights, layer norms, projections)
M : mmap descriptor (file handle, page table, dirty bitmap)
| Database View | Neural View | Unified Operation |
|---|---|---|
| INSERT vector | Add node to graph | embed_and_link() |
| QUERY k-NN | Forward pass (1-hop aggregation) | propagate() |
| UPDATE vector | Gradient descent on node | backprop_node() |
| DELETE vector | Prune node + relink | sparse_dropout() |
| PERSIST | Checkpoint | mmap_sync() |
struct HNSWGraph {
layers: Vec<Layer>, // Hierarchical skip-list structure
entry_point: NodeId, // Top-level entry for traversal
// Neural extensions
edge_weights: SparseMatrix, // Learnable attention coefficients
layer_norms: Vec<LayerNorm>, // Per-layer normalization
}
struct Layer {
level: usize,
nodes: BitSet, // Which nodes exist at this level
adjacency: CSRMatrix, // Compressed sparse row for neighbors
// mmap backing
mmap_region: MmapRegion,
page_resident: AtomicBitmap, // Track which pages are in RAM
}
struct Node {
id: u64,
embedding: *const f32, // Pointer into mmap'd embedding matrix
neighbors: SmallVec<[NodeId; M]>, // M = max connections per layer
// Neural state
hidden_state: Option<*mut f32>, // Activation cache for backprop
gradient: Option<*mut f32>, // Accumulated gradient
}E = E_structural ∪ E_learned
E_structural: HNSW construction edges (fixed post-insert)
E_learned: Attention-weighted edges (differentiable)
Edge weight function:
w(u, v) = σ(α · sim(v_u, v_v) + β · structural_score(u, v))
where:
sim() : Cosine or learned metric
structural_score() : HNSW level co-occurrence, path centrality
α, β : Learnable scalars
σ : Sigmoid activation
/// Single GNN layer operating on HNSW topology
struct RuvectorLayer {
W_msg: Linear, // Message projection: d → d_hidden
W_agg: Linear, // Aggregation projection: d_hidden → d
W_update: GRUCell, // Node update: (d, d) → d
attention: MultiHeadAttention,
norm: LayerNorm,
dropout: f32,
}
impl RuvectorLayer {
/// Forward pass using HNSW neighbors as receptive field
fn forward(&self, graph: &HNSWGraph, node_id: NodeId) -> Tensor {
let h = graph.get_embedding(node_id);
let neighbors = graph.get_neighbors(node_id);
// Message computation
let messages: Vec<Tensor> = neighbors.iter()
.map(|&n_id| {
let h_n = graph.get_embedding(n_id);
let edge_weight = graph.edge_weights.get(node_id, n_id);
self.W_msg.forward(h_n) * edge_weight
})
.collect();
// Attention-weighted aggregation
let (agg, attn_weights) = self.attention.forward(
query: h,
keys: &messages,
values: &messages,
);
// Node update with residual
let h_new = self.W_update.forward(h, self.W_agg.forward(agg));
self.norm.forward(h + h_new) // Residual connection
}
}HNSW layers map to attention scales:
/// Multi-scale propagation across HNSW hierarchy
fn hierarchical_forward(
graph: &HNSWGraph,
query: &Tensor,
layers: &[RuvectorLayer],
) -> Tensor {
let mut h = query.clone();
// Coarse-to-fine traversal (top HNSW layer → bottom)
for (hnsw_level, gnn_layer) in graph.layers.iter().rev().zip(layers) {
// Find entry points at this level
let entry_nodes = graph.search_layer(h, hnsw_level, ef=16);
// Aggregate from this resolution
let level_repr = entry_nodes.iter()
.map(|&node| gnn_layer.forward(graph, node))
.reduce(|a, b| a + b)
.unwrap() / entry_nodes.len() as f32;
// Cross-attention between query and level representation
h = cross_attention(h, level_repr);
}
h
}Transform k-NN search into differentiable operation:
/// Soft k-NN with gradient flow
fn differentiable_search(
graph: &HNSWGraph,
query: &Tensor,
k: usize,
temperature: f32,
) -> (Vec<NodeId>, Tensor) {
// Hard search for candidate set (non-differentiable)
let candidates = graph.search(query, ef=k*4);
// Soft attention over candidates (differentiable)
let similarities: Vec<f32> = candidates.iter()
.map(|&n| cosine_sim(query, graph.get_embedding(n)))
.collect();
let soft_weights = softmax(similarities / temperature);
// Weighted combination for gradient flow
let soft_result = candidates.iter()
.zip(soft_weights.iter())
.map(|(&n, &w)| graph.get_embedding(n) * w)
.sum();
// Return hard indices + soft representation
(candidates[..k].to_vec(), soft_result)
}┌─────────────────────────────────────────────────────────────┐
│ ruvector.mmap file │
├─────────────────────────────────────────────────────────────┤
│ Header (4KB page-aligned) │
│ magic: u64 │
│ version: u32 │
│ n_nodes: u64 │
│ d_embed: u32 │
│ n_layers: u32 │
│ compression_flags: u32 │
│ checksum: [u8; 32] │
├─────────────────────────────────────────────────────────────┤
│ Hot Tier: Full Precision (page-aligned) │
│ [f32; n_hot * d_embed] │
│ Access freq > 0.8, no decompression overhead │
├─────────────────────────────────────────────────────────────┤
│ Warm Tier: fp16 Compressed │
│ [f16; n_warm * d_embed] + scale_factors │
│ Access freq 0.4-0.8, 2x compression │
├─────────────────────────────────────────────────────────────┤
│ Cool Tier: PQ8 Compressed │
│ [u8; n_cool * n_subvectors] + codebook │
│ Access freq 0.1-0.4, 4x compression │
├─────────────────────────────────────────────────────────────┤
│ Cold Tier: PQ4 Compressed │
│ [u4; n_cold * n_subvectors] + outliers │
│ Access freq 0.01-0.1, 8x compression │
├─────────────────────────────────────────────────────────────┤
│ Archive Tier: Binary │
│ [bit; n_archive * d_embed] │
│ Access freq < 0.01, 32x compression │
├─────────────────────────────────────────────────────────────┤
│ Codebook Store (shared across tiers) │
│ pq_centroids: [f32; n_subvectors * n_centroids * d_sub] │
│ entropy_tables: [ANSTable; n_contexts] │
├─────────────────────────────────────────────────────────────┤
│ HNSW Graph Structure │
│ Layer 0: adjacency CSR + node bitmap │
│ Layer 1: adjacency CSR + node bitmap │
│ ... │
│ Layer L: adjacency CSR + node bitmap │
├─────────────────────────────────────────────────────────────┤
│ Tier Metadata │
│ node_tier_map: [TierId; n_nodes] │
│ access_counters: [u16; n_nodes] │
│ last_access: [u32; n_nodes] // epoch timestamp │
├─────────────────────────────────────────────────────────────┤
│ Learned Parameters θ │
│ edge_weights: SparseMatrix │
│ layer_norms: [LayerNorm; n_gnn_layers] │
│ projections: [Linear; n_gnn_layers * 3] │
├─────────────────────────────────────────────────────────────┤
│ Gradient Accumulator (optional, for training) │
│ grad_V: [f32; n_nodes * d_embed] │
│ grad_θ: [f32; n_params] │
├─────────────────────────────────────────────────────────────┤
│ Transaction Log (append-only, for ACID) │
│ [Operation; variable] │
└─────────────────────────────────────────────────────────────┘
struct MmapManager {
file: File,
mmap: MmapMut,
page_size: usize,
// Page tracking
access_bitmap: AtomicBitmap, // Which pages accessed this epoch
dirty_bitmap: AtomicBitmap, // Which pages modified
pin_count: Vec<AtomicU32>, // Pinned pages (in active computation)
}
impl MmapManager {
/// Get embedding with page fault tracking
fn get_embedding(&self, node_id: NodeId) -> &[f32] {
let offset = self.embedding_offset(node_id);
let page_idx = offset / self.page_size;
// Mark page accessed (for LRU tracking)
self.access_bitmap.set(page_idx);
// Return slice into mmap'd region
unsafe {
std::slice::from_raw_parts(
self.mmap.as_ptr().add(offset) as *const f32,
self.d_embed,
)
}
}
/// Write embedding with CoW semantics
fn set_embedding(&mut self, node_id: NodeId, data: &[f32]) {
let offset = self.embedding_offset(node_id);
let page_idx = offset / self.page_size;
// Mark dirty for async flush
self.dirty_bitmap.set(page_idx);
unsafe {
let dst = self.mmap.as_mut_ptr().add(offset) as *mut f32;
std::ptr::copy_nonoverlapping(data.as_ptr(), dst, data.len());
}
}
/// Background sync of dirty pages
async fn flush_dirty(&self) -> io::Result<()> {
for page_idx in self.dirty_bitmap.iter_set() {
let offset = page_idx * self.page_size;
self.mmap.flush_range(offset, self.page_size)?;
}
self.dirty_bitmap.clear();
Ok(())
}
/// Prefetch pages for upcoming traversal
fn prefetch(&self, node_ids: &[NodeId]) {
for &node_id in node_ids {
let offset = self.embedding_offset(node_id);
unsafe {
libc::madvise(
self.mmap.as_ptr().add(offset) as *mut _,
self.d_embed * 4,
libc::MADV_WILLNEED,
);
}
}
}
}/// Persistent gradient buffer for distributed/async training
struct MmapGradientAccumulator {
grad_mmap: MmapMut,
lock_granularity: usize, // Nodes per lock
locks: Vec<RwLock<()>>,
}
impl MmapGradientAccumulator {
/// Atomic gradient accumulation
fn accumulate(&self, node_id: NodeId, grad: &[f32]) {
let lock_idx = node_id as usize / self.lock_granularity;
let _guard = self.locks[lock_idx].write();
let offset = self.grad_offset(node_id);
unsafe {
let dst = self.grad_mmap.as_mut_ptr().add(offset) as *mut f32;
for i in 0..grad.len() {
*dst.add(i) += grad[i];
}
}
}
/// Apply accumulated gradients with optimizer
fn apply(&mut self, optimizer: &mut Optimizer, embeddings: &mut MmapManager) {
for node_id in 0..self.n_nodes {
let grad = self.get_grad(node_id);
let embedding = embeddings.get_embedding_mut(node_id);
optimizer.step(embedding, grad);
}
self.zero_grad();
}
}/// Self-supervised training using graph structure
fn train_epoch(
graph: &mut HNSWGraph,
gnn: &mut RuvectorGNN,
optimizer: &mut Optimizer,
config: &TrainConfig,
) -> f32 {
let mut total_loss = 0.0;
// Sample anchor nodes
let anchors = graph.sample_nodes(config.batch_size);
for anchor in anchors {
// Positive: HNSW neighbors (structurally similar)
let positives = graph.get_neighbors(anchor);
// Negatives: random non-neighbors
let negatives = graph.sample_non_neighbors(anchor, config.n_negatives);
// Forward pass
let h_anchor = gnn.forward(graph, anchor);
let h_pos: Vec<_> = positives.iter()
.map(|&p| gnn.forward(graph, p))
.collect();
let h_neg: Vec<_> = negatives.iter()
.map(|&n| gnn.forward(graph, n))
.collect();
// InfoNCE loss
let loss = info_nce_loss(&h_anchor, &h_pos, &h_neg, config.temperature);
// Backward pass
loss.backward();
total_loss += loss.item();
// Gradient step
optimizer.step();
optimizer.zero_grad();
// Async flush if needed
if graph.mmap.dirty_page_count() > config.flush_threshold {
graph.mmap.flush_dirty_async();
}
}
total_loss / config.batch_size as f32
}
/// InfoNCE contrastive loss
fn info_nce_loss(
anchor: &Tensor,
positives: &[Tensor],
negatives: &[Tensor],
temperature: f32,
) -> Tensor {
let pos_sims: Vec<f32> = positives.iter()
.map(|p| cosine_sim(anchor, p) / temperature)
.collect();
let neg_sims: Vec<f32> = negatives.iter()
.map(|n| cosine_sim(anchor, n) / temperature)
.collect();
// Log-sum-exp over negatives
let neg_logsumexp = neg_sims.iter()
.map(|&s| s.exp())
.sum::<f32>()
.ln();
// Average over positives
-pos_sims.iter()
.map(|&s| s - neg_logsumexp)
.sum::<f32>() / positives.len() as f32
}/// Update graph and model with new data
async fn online_update(
graph: &mut HNSWGraph,
gnn: &mut RuvectorGNN,
new_vectors: &[Vector],
config: &OnlineConfig,
) {
for vector in new_vectors {
// 1. Insert into HNSW (creates structural edges)
let node_id = graph.insert(vector);
// 2. Initialize edge weights from similarity
for &neighbor in graph.get_neighbors(node_id) {
let sim = cosine_sim(vector, graph.get_embedding(neighbor));
graph.edge_weights.set(node_id, neighbor, sim);
}
// 3. Local fine-tuning (few gradient steps)
for _ in 0..config.local_steps {
let loss = local_contrastive_loss(graph, gnn, node_id);
loss.backward();
gnn.optimizer.step();
}
// 4. Propagate updates to neighbors (optional)
if config.propagate_updates {
for &neighbor in graph.get_neighbors(node_id) {
let loss = local_contrastive_loss(graph, gnn, neighbor);
loss.backward();
gnn.optimizer.step();
}
}
}
// Checkpoint
graph.mmap.flush_dirty().await?;
}pub struct RuvectorQuery {
/// Raw vector for similarity search
vector: Option<Tensor>,
/// Text (requires encoder)
text: Option<String>,
/// Node ID for graph traversal
node_id: Option<NodeId>,
/// Query mode
mode: QueryMode,
/// Parameters
k: usize,
ef: usize, // HNSW search breadth
gnn_depth: usize, // GNN propagation hops
temperature: f32, // Softmax temperature
return_attention: bool, // Include attention weights
}
pub enum QueryMode {
/// Pure HNSW search (no GNN)
VectorSearch,
/// GNN-enhanced search (propagate then search)
NeuralSearch,
/// Return induced subgraph around results
SubgraphExtraction,
/// Differentiable search for training
DifferentiableSearch,
}
pub struct QueryResult {
nodes: Vec<NodeId>,
scores: Vec<f32>,
embeddings: Option<Vec<Tensor>>,
attention_weights: Option<SparseMatrix>,
subgraph: Option<SubGraph>,
latency: Duration,
}impl Ruvector {
pub fn query(&self, q: RuvectorQuery) -> QueryResult {
let start = Instant::now();
// Encode query if needed
let query_vec = match (&q.vector, &q.text) {
(Some(v), _) => v.clone(),
(_, Some(t)) => self.encoder.encode(t),
_ => panic!("Must provide vector or text"),
};
let result = match q.mode {
QueryMode::VectorSearch => {
// Pure HNSW
let (nodes, scores) = self.graph.search(&query_vec, q.k, q.ef);
QueryResult { nodes, scores, ..Default::default() }
}
QueryMode::NeuralSearch => {
// GNN propagation then search
let enhanced = self.gnn.forward_query(&query_vec, q.gnn_depth);
let (nodes, scores) = self.graph.search(&enhanced, q.k, q.ef);
QueryResult { nodes, scores, ..Default::default() }
}
QueryMode::SubgraphExtraction => {
// Search + expand to subgraph
let (nodes, scores) = self.graph.search(&query_vec, q.k, q.ef);
let subgraph = self.graph.extract_subgraph(&nodes, hops=1);
QueryResult { nodes, scores, subgraph: Some(subgraph), ..Default::default() }
}
QueryMode::DifferentiableSearch => {
// Soft search with gradient flow
let (nodes, soft_result) = differentiable_search(
&self.graph, &query_vec, q.k, q.temperature
);
QueryResult {
nodes,
embeddings: Some(vec![soft_result]),
..Default::default()
}
}
};
QueryResult { latency: start.elapsed(), ..result }
}
}/// Bridge ruvector to AgentDB's cognitive layer
impl CognitiveSubstrate for Ruvector {
/// Store thought embedding with relational context
fn store_thought(&mut self, thought: Thought) -> ThoughtId {
let embedding = self.encoder.encode(&thought.content);
let node_id = self.graph.insert(&embedding);
// Link to related thoughts via GNN similarity
let related = self.query(RuvectorQuery {
vector: Some(embedding.clone()),
mode: QueryMode::NeuralSearch,
k: 10,
..Default::default()
});
for &related_id in &related.nodes {
self.graph.add_semantic_edge(node_id, related_id);
}
ThoughtId(node_id)
}
/// Retrieve contextually relevant thoughts
fn retrieve_context(&self, query: &str, depth: usize) -> Vec<Thought> {
let result = self.query(RuvectorQuery {
text: Some(query.to_string()),
mode: QueryMode::SubgraphExtraction,
k: 20,
gnn_depth: depth,
..Default::default()
});
result.nodes.iter()
.map(|&n| self.load_thought(n))
.collect()
}
}/// Symbolic ↔ Neural interface
impl NeuroSymbolicBridge for Ruvector {
/// Ground symbolic expression to neural embedding
fn ground(&self, expr: &SymbolicExpr) -> Tensor {
// Encode symbolic structure
let structural_emb = self.symbol_encoder.encode(expr);
// Retrieve similar neural concepts
let similar = self.query(RuvectorQuery {
vector: Some(structural_emb),
mode: QueryMode::NeuralSearch,
k: 5,
..Default::default()
});
// Blend structural and retrieved
let retrieved_emb = similar.nodes.iter()
.zip(similar.scores.iter())
.map(|(&n, &s)| self.graph.get_embedding(n) * s)
.sum();
structural_emb * 0.5 + retrieved_emb * 0.5
}
/// Lift neural embedding to symbolic candidates
fn lift(&self, embedding: &Tensor, beam_width: usize) -> Vec<SymbolicExpr> {
let subgraph = self.query(RuvectorQuery {
vector: Some(embedding.clone()),
mode: QueryMode::SubgraphExtraction,
k: beam_width,
..Default::default()
}).subgraph.unwrap();
// Extract symbolic patterns from subgraph structure
self.pattern_extractor.extract(&subgraph)
}
}Tensor-compress provides hierarchical compression for neural representations, enabling:
- Adaptive precision based on access patterns
- Lossy/lossless hybrid encoding
- Streaming decompression aligned with mmap page faults
/// Tensor compression codec with learned quantization
pub struct TensorCompress {
/// Compression levels (higher = more aggressive)
levels: Vec<CompressionLevel>,
/// Learned codebook for vector quantization
codebook: ProductQuantizer,
/// Entropy coder for residuals
entropy_coder: ANSCoder,
/// Page-aligned compression blocks
block_size: usize,
}
#[derive(Clone, Copy)]
pub enum CompressionLevel {
/// Full precision (f32) - hot vectors
None,
/// fp16 with learned scaling
Half { scale: f16 },
/// 8-bit product quantization
PQ8 { subvectors: u8, centroids: u8 },
/// 4-bit with outlier preservation
PQ4 { subvectors: u8, outlier_threshold: f32 },
/// Binary with Hamming distance
Binary { threshold: f32 },
}
impl TensorCompress {
/// Compress embedding with adaptive level selection
pub fn compress(&self, embedding: &[f32], access_freq: f32) -> CompressedTensor {
let level = self.select_level(access_freq);
match level {
CompressionLevel::None => {
CompressedTensor::Full(embedding.to_vec())
}
CompressionLevel::Half { scale } => {
let quantized: Vec<f16> = embedding.iter()
.map(|&x| f16::from_f32(x * scale))
.collect();
CompressedTensor::Half { data: quantized, scale }
}
CompressionLevel::PQ8 { subvectors, centroids } => {
let codes = self.codebook.encode(embedding, subvectors, centroids);
let residual = self.compute_residual(embedding, &codes);
let compressed_residual = self.entropy_coder.encode(&residual);
CompressedTensor::PQ8 { codes, residual: compressed_residual }
}
CompressionLevel::PQ4 { subvectors, outlier_threshold } => {
let (codes, outliers) = self.codebook.encode_with_outliers(
embedding, subvectors, outlier_threshold
);
CompressedTensor::PQ4 { codes, outliers }
}
CompressionLevel::Binary { threshold } => {
let bits: BitVec = embedding.iter()
.map(|&x| x > threshold)
.collect();
CompressedTensor::Binary(bits)
}
}
}
/// Decompress with optional precision upgrade
pub fn decompress(&self, compressed: &CompressedTensor) -> Vec<f32> {
match compressed {
CompressedTensor::Full(data) => data.clone(),
CompressedTensor::Half { data, scale } => {
data.iter().map(|&x| x.to_f32() / scale).collect()
}
CompressedTensor::PQ8 { codes, residual } => {
let base = self.codebook.decode(codes);
let res = self.entropy_coder.decode(residual);
base.iter().zip(res.iter()).map(|(&b, &r)| b + r).collect()
}
CompressedTensor::PQ4 { codes, outliers } => {
self.codebook.decode_with_outliers(codes, outliers)
}
CompressedTensor::Binary(bits) => {
bits.iter().map(|b| if b { 1.0 } else { -1.0 }).collect()
}
}
}
/// Adaptive level selection based on access frequency
fn select_level(&self, access_freq: f32) -> CompressionLevel {
// Hot vectors: keep full precision
// Warm vectors: fp16 or PQ8
// Cold vectors: PQ4 or binary
match access_freq {
f if f > 0.8 => CompressionLevel::None,
f if f > 0.4 => CompressionLevel::Half { scale: f16::from_f32(1.0) },
f if f > 0.1 => CompressionLevel::PQ8 { subvectors: 8, centroids: 256 },
f if f > 0.01 => CompressionLevel::PQ4 { subvectors: 8, outlier_threshold: 2.0 },
_ => CompressionLevel::Binary { threshold: 0.0 },
}
}
}
/// Integration with ruvector mmap layer
impl Ruvector {
/// Store with compression, decompress on access
pub fn insert_compressed(&mut self, embedding: &[f32]) -> NodeId {
let compressed = self.tensor_compress.compress(embedding, 0.0);
let node_id = self.allocate_node();
// Store compressed in mmap
self.compressed_store.write(node_id, &compressed);
// Maintain HNSW with full precision (temporary)
self.graph.insert_with_id(embedding, node_id);
node_id
}
/// Lazy decompression on page fault
pub fn get_embedding_compressed(&self, node_id: NodeId) -> Vec<f32> {
let compressed = self.compressed_store.read(node_id);
// Track access for adaptive recompression
self.access_tracker.record(node_id);
self.tensor_compress.decompress(&compressed)
}
/// Background recompression based on access patterns
pub async fn recompress_cold_vectors(&mut self) {
let access_stats = self.access_tracker.get_frequencies();
for (node_id, freq) in access_stats {
let current_level = self.compressed_store.get_level(node_id);
let target_level = self.tensor_compress.select_level(freq);
if target_level.compression_ratio() > current_level.compression_ratio() {
// Recompress to smaller representation
let embedding = self.get_embedding_compressed(node_id);
let recompressed = self.tensor_compress.compress(&embedding, freq);
self.compressed_store.write(node_id, &recompressed);
}
}
}
}
/// Streaming decompression aligned with GNN inference
impl RuvectorLayer {
/// Forward pass with on-the-fly decompression
fn forward_compressed(
&self,
graph: &Ruvector,
node_id: NodeId,
decompress_cache: &mut LruCache<NodeId, Vec<f32>>,
) -> Tensor {
// Check cache first
let h = decompress_cache.get(&node_id).cloned().unwrap_or_else(|| {
let decompressed = graph.get_embedding_compressed(node_id);
decompress_cache.put(node_id, decompressed.clone());
decompressed
});
let neighbors = graph.graph.get_neighbors(node_id);
// Prefetch and decompress neighbors in parallel
let messages: Vec<Tensor> = neighbors.par_iter()
.map(|&n_id| {
let h_n = decompress_cache.get(&n_id).cloned().unwrap_or_else(|| {
graph.get_embedding_compressed(n_id)
});
let edge_weight = graph.graph.edge_weights.get(node_id, n_id);
self.W_msg.forward(&h_n) * edge_weight
})
.collect();
// Standard aggregation
let (agg, _) = self.attention.forward(&h, &messages, &messages);
let h_new = self.W_update.forward(&h, &self.W_agg.forward(&agg));
self.norm.forward(&(h + h_new))
}
}| Operation | Time | Space | I/O (mmap) |
|---|---|---|---|
| Insert | O(log n) | O(M·L) | ~L page faults |
| k-NN Search | O(log n + k) | O(ef) | ~ef page faults |
| GNN Forward (1 hop) | O(M) | O(M·d) | ~M page faults |
| GNN Forward (L hops) | O(M^L) | O(M^L·d) | Cached after first |
| Gradient Update | O(d) | O(d) | 1 page dirty |
| Flush | O(dirty pages) | O(1) | Sequential write |
| Compress (PQ8) | O(d · k) | O(d) | 0 (in-memory) |
| Decompress (PQ8) | O(d) | O(d) | 1 codebook lookup |
| Tier promotion | O(d) | O(d) | 2 page writes |
Where:
- n = number of nodes
- d = embedding dimension
- M = max neighbors per node
- L = HNSW layers / GNN depth
- ef = search beam width
- k = number of centroids
| Tier | Precision | Compression | Recall@10 | Decompress Latency |
|---|---|---|---|---|
| Hot | f32 | 1x | 100% | 0 ns |
| Warm | f16 | 2x | 99.9% | ~10 ns |
| Cool | PQ8 | 4x | 99.2% | ~50 ns |
| Cold | PQ4 + outliers | 8x | 97.5% | ~100 ns |
| Archive | Binary | 32x | 85% | ~20 ns |
Memory savings at scale (100M vectors, d=1024):
| Tier Distribution | Raw Size | Compressed | Savings |
|---|---|---|---|
| 100% Hot | 400 GB | 400 GB | 0% |
| 10/30/40/15/5% | 400 GB | 68 GB | 83% |
| 5/15/30/30/20% | 400 GB | 42 GB | 89% |
| 1/5/20/40/34% | 400 GB | 28 GB | 93% |
| Metric | Target | Rationale |
|---|---|---|
| Insert throughput | 50K vec/s | Streaming ingest |
| Query latency (p50) | < 1ms | Real-time retrieval |
| Query latency (p99) | < 10ms | Tail latency bound |
| GNN inference | < 5ms | Interactive applications |
| Memory efficiency | 0.3x vector size | With tensor-compress |
| Cold start | < 100ms | mmap advantage |
| Recompression throughput | 10K vec/s | Background tier migration |
| Codebook update | < 1s | Online adaptation |
[ruvector]
# Embedding configuration
embedding_dim = 1024
max_nodes = 100_000_000 # 100M vectors
# HNSW parameters
hnsw_m = 32 # Max neighbors per layer
hnsw_ef_construction = 200
hnsw_ef_search = 64
hnsw_ml = 0.36 # Level multiplier (1/ln(M))
# GNN parameters
gnn_layers = 3
gnn_hidden_dim = 512
gnn_heads = 8
gnn_dropout = 0.1
# mmap configuration
mmap_path = "/data/ruvector.mmap"
page_size = 4096
flush_interval_ms = 1000
prefetch_depth = 2 # Prefetch neighbor pages
# Training
learning_rate = 0.001
temperature = 0.07
batch_size = 256
negative_samples = 64
[tensor_compress]
# Tier thresholds (access frequency boundaries)
hot_threshold = 0.8 # > 0.8 → full precision
warm_threshold = 0.4 # 0.4-0.8 → fp16
cool_threshold = 0.1 # 0.1-0.4 → PQ8
cold_threshold = 0.01 # 0.01-0.1 → PQ4
# < 0.01 → binary
# Product quantization settings
pq_subvectors = 8 # Number of subvector partitions
pq8_centroids = 256 # Centroids for 8-bit PQ
pq4_centroids = 16 # Centroids for 4-bit PQ
outlier_threshold = 2.5 # Std devs for outlier preservation
# Entropy coding
entropy_coder = "ans" # "ans" | "huffman" | "none"
residual_bits = 4 # Bits for residual encoding
# Tier management
recompress_interval_s = 300 # Background recompression period
access_decay_halflife_s = 3600 # Exponential decay for access freq
promotion_threshold = 1.5 # Freq multiplier to trigger promotion
demotion_threshold = 0.5 # Freq multiplier to trigger demotion
# Decompression cache
decompress_cache_size_mb = 512 # LRU cache for hot decompressed vectors
prefetch_neighbors = true # Decompress neighbor vectors ahead
# Codebook training
codebook_sample_size = 100000 # Vectors to sample for codebook init
codebook_iterations = 20 # k-means iterations
codebook_update_freq = 10000 # Inserts between codebook refresh- Edge weights decay over time (recency bias)
- Temporal attention:
w(u,v,t) = w(u,v) · exp(-λ(t_now - t_edge)) - Enables modeling of evolving knowledge graphs
- Separate HNSW graphs per modality (text, image, audio)
- Cross-modal GNN layers for alignment
- Unified query across modalities
- Partition by HNSW layer (hierarchical sharding)
- Or partition by embedding cluster (locality-preserving)
- Gossip protocol for cross-shard edge updates
- HNSW traversal → Systolic array pattern
- GNN aggregation → Vector MAC units
- Tensor-compress → Dedicated decompression units
- mmap → NVMe direct access (bypass CPU)
Learned Codebooks
- End-to-end differentiable quantization
- Task-specific codebook optimization (retrieval vs classification)
- Hierarchical codebooks for progressive refinement
Neural Compression
- Replace PQ with learned neural codec
- Autoencoder-based compression with latent bottleneck
- Asymmetric encode/decode (heavy encode, fast decode)
Semantic-Aware Tiering
- Compress based on semantic importance, not just access frequency
- Preserve precision for cluster centroids and graph hubs
- Lossy compression for peripheral nodes
Streaming Decompression Pipeline
- SIMD-optimized batch decompression
- GPU-accelerated codebook lookups
- Speculative decompression based on query trajectory
Compression-Aware Training
- Quantization-aware GNN training
- Gradient scaling for compressed representations
- Straight-through estimator for discrete codebook indices
Standard attention:
Attention(Q, K, V) = softmax(QK^T / √d) V
HNSW-induced sparse attention:
SparseAttention(q, V, E) = Σ_{v ∈ N_E(q)} softmax(q · v / √d) · v
where N_E(q) = HNSW neighbors of q under edge set E
This is attention with sparsity pattern determined by graph structure rather than learned or fixed masks.
HNSW search is non-differentiable (discrete neighbor selection). We approximate via:
- Straight-Through Estimator: Forward uses hard selection, backward uses soft attention
- Gumbel-Softmax: Sample neighbors with temperature annealing
- Score Function Estimator: REINFORCE-style gradient for discrete choices
Recommended: Straight-through for simplicity, Gumbel for end-to-end training.
ruvector provides:
- Read consistency: Reads see writes from same thread
- Eventual consistency: Cross-thread writes visible after flush
- Crash consistency: Transaction log enables recovery to last checkpoint
Not provided:
- Strict serializability (use external locking if needed)
- Multi-file atomic transactions
Specification version: 0.1.0 Author: ruvector architecture team License: MIT