This guide explains four powerful mathematical techniques that will differentiate ruvector from every other vector database on the market. Each solves a real problem that current databases can’t handle well.
Imagine you have two piles of sand and need to reshape one pile to match the other. Optimal transport calculates the minimum effort required to move the sand—how much mass moves, how far it travels.
For vectors, this means: how much “work” does it take to transform one distribution into another?
Traditional vector similarity (cosine, Euclidean) compares point-to-point. But what if you’re comparing:
- A document to another document (bags of word embeddings)
- An image region to another region (sets of feature vectors)
- A user’s behavior pattern to another user’s pattern
You need to compare sets of vectors, not individual vectors. Optimal transport does this naturally.
| Feature | What It Does |
|---|---|
| Wasserstein Distance | Measures “cost” to transform distribution A into B |
| Sinkhorn Algorithm | Fast approximation (1000x faster than exact) |
| Sliced Wasserstein | Ultra-fast 1D projections, scales to millions |
| Gromov-Wasserstein | Compares structures without shared space |
Cross-Lingual Search
“Find French documents similar to this English document”
Problem: French and English embeddings live in different spaces. Solution: Gromov-Wasserstein compares structure, not coordinates.
Image Retrieval
“Find product images similar to this photo”
Each image = set of regional features. OT finds minimum-cost matching between regions.
Time Series Matching
“Find similar user behavior patterns”
User sessions = sequences of action embeddings. OT handles variable-length, unordered comparisons.
Document Similarity
“Compare these two legal contracts”
Documents as bags of sentence embeddings. OT captures semantic overlap better than averaging.
- Pinecone: Point vectors only
- Weaviate: Point vectors only
- Milvus: Point vectors only
- ruvector: Set-to-set similarity via optimal transport ✓
Exact EMD: O(n³) → 1M points = hours
Sinkhorn: O(n²) → 1M points = minutes
Sliced Wasserstein: O(n log n) → 1M points = seconds ✓
The world isn’t flat. Neither is your data.
- Euclidean space (flat): Good for grid-like data, images, general embeddings
- Hyperbolic space (saddle-shaped): Perfect for hierarchies, trees, taxonomies
- Spherical space (ball-shaped): Ideal for cyclical data, periodic patterns
Mixed-curvature combines all three: H^n × S^m × E^p
Real data has mixed structure:
- Organizational charts (hierarchical) + skills (flat) + quarterly cycles (periodic)
- Product categories (tree) + features (flat) + seasonal trends (cyclical)
- Knowledge graphs (hierarchical) + entity types (flat) + temporal patterns (cyclical)
Forcing everything into Euclidean space wastes dimensions and distorts relationships.
| Feature | What It Does |
|---|---|
| Poincaré Ball | Hyperbolic space for tree structures |
| Lorentz Model | Numerically stable hyperbolic (better for deep hierarchies) |
| Spherical Embeddings | Cyclical/periodic pattern representation |
| Product Manifolds | Combine spaces: 10D hyperbolic + 5D spherical + 100D Euclidean |
E-commerce Product Search
Product taxonomy (categories → subcategories → items) is a tree. Hyperbolic embeddings need 5 dimensions vs 100+ Euclidean for same quality. Result: 20x memory reduction, better semantic accuracy.
Organizational Knowledge Graphs
Company hierarchy + skills database + project timelines Mixed-curvature captures all three structures in unified space.
Biomedical Ontologies
Gene ontology, disease taxonomies, drug hierarchies Hyperbolic space preserves parent-child relationships with 32% less distortion.
Social Network Analysis
Follower hierarchies (hyperbolic) + interest communities (Euclidean) + activity cycles (spherical)
ruvector v0.1.96 already has:
- ✅ Poincaré ball operations
- ✅ Lorentz model
- ✅ Hyperbolic attention mechanisms
- ✅ Dual-space attention (Euclidean + Hyperbolic)
Gap to fill: Product manifolds (H^n × S^m × E^p) for heterogeneous data.
| Data Type | Euclidean Dims | Hyperbolic Dims | Memory Savings |
|---|---|---|---|
| WordNet hierarchy | 200 | 10 | 20x |
| Organizational tree | 128 | 8 | 16x |
| Product taxonomy | 256 | 16 | 16x |
TDA answers: What is the “shape” of my data?
- How many clusters are there? (0-dimensional holes)
- Are there loops or cycles? (1-dimensional holes)
- Are there voids or cavities? (2-dimensional holes)
Unlike clustering algorithms that require you to specify k, TDA discovers structure automatically.
Traditional clustering:
- Requires specifying number of clusters upfront
- Sensitive to noise and outliers
- Misses complex shapes (rings, spirals, Swiss rolls)
TDA provides parameter-free shape analysis that’s robust to noise.
| Feature | What It Does |
|---|---|
| Persistent Homology | Finds features that “persist” across scales |
| Betti Numbers | Counts holes in each dimension (β₀=clusters, β₁=loops, β₂=voids) |
| Persistence Diagrams | Visualizes feature birth/death across scales |
| Mapper Algorithm | Creates interpretable network summary of high-D data |
Embedding Quality Validation
“Are my embeddings actually forming meaningful clusters?”
TDA reveals if your embedding space has the right topology—before you deploy.
Anomaly Detection
Normal data forms specific shapes. Anomalies break the shape. TDA detects structural anomalies invisible to distance-based methods.
Coverage Analysis
“Does my training data cover the problem space?”
TDA identifies holes in coverage—regions where you have no training examples.
Semantic Cluster Discovery
Without specifying k, discover natural groupings in document embeddings. Persistence tells you which clusters are “real” vs artifacts of threshold choice.
Vectors → Filter Function → Covering → Clustering → Network Graph
↓ ↓ ↓ ↓ ↓
Your data PCA/density Overlapping Per-bin Interpretable
projection intervals clusters visualization
Zilliz (Milvus creators) published research showing TDA improves embedding model selection:
- Identifies which embedding model best preserves data structure
- Reveals training gaps before production deployment
- Quantifies embedding quality beyond simple benchmarks
No vector database offers built-in TDA. This enables:
npx ruvector tda analyze ./my-vectors.db→ Automatic shape reportnpx ruvector tda validate→ Embedding quality scorenpx ruvector tda coverage→ Training gap identification
Information geometry treats probability distributions as points on a curved surface.
The key insight: when optimizing ML models, the “natural” direction isn’t always the steepest descent—it’s the direction that accounts for the curvature of the probability landscape.
Standard gradient descent treats all parameter directions equally. But parameters aren’t equal:
- Some directions are “steep” (small changes = big effects)
- Some directions are “flat” (large changes = small effects)
Natural gradient rescales updates based on information geometry, converging faster and more stably.
| Feature | What It Does |
|---|---|
| Fisher Information Matrix | Measures “curvature” of probability space |
| Natural Gradient Descent | Gradient rescaled by inverse Fisher matrix |
| K-FAC | Efficient approximation (makes natural gradient practical) |
| α-Divergences | Generalized family including KL divergence |
Faster Index Training
HNSW and GNN-enhanced indexes require optimization. Natural gradient converges in 3-5x fewer iterations than Adam.
Embedding Fine-Tuning
Fine-tuning embedding models with natural gradient:
- Faster convergence
- Better generalization
- More stable training
Online Learning
As your vector DB learns from queries, natural gradient provides:
- Efficient incremental updates
- Stable continual learning
- Better catastrophic forgetting resistance
Probabilistic Embeddings
Instead of point vectors, represent uncertainty:
- Each embedding = Gaussian distribution
- Fisher-Rao distance measures distributional similarity
- Captures “I’m not sure about this entity” naturally
Information geometry is mathematically sophisticated but practically powerful. No Rust implementation exists. Building it into ruvector means:
- Faster self-learning index optimization
- Better embedding fine-tuning for RAG
- Unique capability no competitor offers
| Capability | Pinecone | Weaviate | Milvus | Chroma | ruvector |
|---|---|---|---|---|---|
| Euclidean/Cosine | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hyperbolic Space | ❌ | ❌ | ❌ | ❌ | ✅ |
| Mixed-Curvature | ❌ | ❌ | ❌ | ❌ | 🔜 |
| Optimal Transport | ❌ | ❌ | ❌ | ❌ | 🔜 |
| Persistent Homology | ❌ | ❌ | ❌ | ❌ | 🔜 |
| Natural Gradient | ❌ | ❌ | ❌ | ❌ | 🔜 |
| GNN-Enhanced Index | ❌ | ❌ | ❌ | ❌ | ✅ |
| WASM Browser | ❌ | ❌ | ❌ | ✅ | ✅ |
- Sliced Wasserstein for fast set-to-set similarity
- Log-stabilized Sinkhorn for accuracy when needed
- CLI:
npx ruvector transport distance
- Extend existing hyperbolic to full product spaces
- Learnable curvature parameters
- CLI:
npx ruvector create --geometry "hyperbolic:10,spherical:5,euclidean:100"
- Persistent homology with apparent pairs optimization
- Mapper algorithm for exploration
- CLI:
npx ruvector tda analyze,npx ruvector tda validate
- Fisher information computation via autodiff
- Natural gradient optimizer for index training
- CLI:
npx ruvector train --optimizer natural-gradient
These four algorithms transform ruvector from “another vector database” to “the vector database that understands data shape”:
| Algorithm | One-Line Value Prop |
|---|---|
| Optimal Transport | Compare sets of vectors, not just points |
| Mixed-Curvature | 10-20x memory savings on hierarchical data |
| TDA | Parameter-free shape discovery and validation |
| Information Geometry | 3-5x faster index optimization |
Together, they enable use cases no other vector database can touch: cross-lingual retrieval, topology-aware clustering, hierarchical knowledge graphs, and self-improving indexes.
The Rust/WASM implementation means these capabilities run everywhere—server, edge, browser—with native performance.
The Rust ecosystem for advanced mathematical algorithms required by modern vector databases reveals a fragmented but maturing landscape.
Optimal transport, persistent homology, and information geometry implementations are nascent, while non-Euclidean geometry shows the most promise through production-ready implementations like ruvector. Custom development will be required for most production deployments, but WASM-compatible foundations exist across all domains.
| Domain | Maturity | Production-Ready? | WASM-Compatible | Recommended Path |
|---|---|---|---|---|
| Optimal Transport | ★☆☆☆☆ | No | Blocked (BLAS deps) | Custom implementation |
| Manifold/Hyperbolic | ★★★☆☆ | Partial (ruvector) | Yes | Extend ruvector |
| TDA/Persistent Homology | ★★☆☆☆ | No | Possible (phlite) | Port Ripser or extend phlite |
| Information Geometry | ★☆☆☆☆ | No | Yes (via burn) | Build new crate |
The critical insight: no single Rust crate provides production-grade implementations for these algorithms with WASM compatibility. However, foundational linear algebra (nalgebra, ndarray) and autodiff (burn) are mature and WASM-ready, enabling custom implementations.
The Rust optimal transport ecosystem centers on rust-optimal-transport, a stale crate (last updated 3+ years ago) with fundamental WASM blockers. Only basic Sinkhorn and EMD are implemented—** no Gromov-Wasserstein, Sliced Wasserstein, or differentiable variants exist**.
| Crate | Algorithms | WASM? | Status |
|---|---|---|---|
rust-optimal-transport |
EMD, Sinkhorn, Unbalanced | ❌ Blocked | Inactive (3+ years) |
optimal-transport-rs |
Minimal | Unknown | Not recommended |
rust-emd |
EMD only | ❌ C++ wrapper | Limited |
The primary blocker for WASM compilation is dependency on ndarray-linalg and C++ FFI for the network simplex algorithm. The EMD solver wraps C++ code via cxx, which cannot compile to wasm32-unknown-unknown.
Python’s POT library provides 40+ algorithms absent from Rust: Gromov-Wasserstein (critical for cross-domain similarity), Sliced Wasserstein (near-linear scaling), Low-rank Sinkhorn (memory-efficient), Wasserstein barycenters, and partial optimal transport. For vector database integration, Sliced Wasserstein is particularly important—it provides O(n log n) complexity versus O(n³) for exact EMD.
A pure-Rust implementation should use nalgebra (fully WASM-compatible) instead of ndarray-linalg, with this module structure:
pub mod optimal_transport {
pub mod sinkhorn {
pub struct LogStabilizedSinkhorn { reg: f32, max_iter: usize }
pub struct GreedySinkhorn;
}
pub mod sliced {
pub fn sliced_wasserstein(a: &[f32], b: &[f32], n_proj: usize) -> f32;
}
pub mod gromov {
pub fn entropic_gw(Ca: &Array2, Cb: &Array2, reg: f32) -> Array2;
}
}GPU acceleration via wgpu is viable for browser deployment—Sinkhorn’s matrix-vector operations map well to WebGPU compute shaders.
The ruvector project provides the most comprehensive Rust implementation for non-Euclidean geometry, including Poincaré ball and Lorentz/hyperboloid models with full WASM support. Performance benchmarks show 143ns for cosine distance on 1536-dimensional vectors and 61µs for HNSW k=10 search.
| Crate | Models | Operations | WASM |
|---|---|---|---|
| ruvector-core | Poincaré, Lorentz | expMap, logMap, Möbius addition | ✅ |
| ruvector-attention | Hyperbolic attention | 39 attention mechanisms | ✅ |
| manopt-rs | Generic Riemannian | RiemannianAdam/SGD | Likely |
No Rust equivalent exists for geoopt’s ProductManifold, which enables mixed-curvature representations (H^n × S^m × E^p). Research shows mixed-curvature embeddings achieve 32.55% distortion reduction on social network data versus single-space embeddings. This capability is essential for heterogeneous data combining hierarchical (hyperbolic), cyclical (spherical), and grid-like (Euclidean) structures.
The Poincaré ball model has a hard numerical boundary at radius ~38 in float64—points beyond this collapse to the boundary. The Lorentz/hyperboloid model provides better numerical properties for deep hierarchies. Recommendation: train in Lorentz, project to Poincaré for visualization.
pub trait Manifold: Send + Sync {
fn dimension(&self) -> usize;
fn curvature(&self) -> f64;
}
pub trait ExpLogMaps: Manifold {
fn exp_map(&self, base: &DVector<f64>, tangent: &DVector<f64>) -> DVector<f64>;
fn log_map(&self, base: &DVector<f64>, point: &DVector<f64>) -> DVector<f64>;
}
pub trait RiemannianOptimizable: ExpLogMaps {
fn egrad_to_rgrad(&self, point: &DVector<f64>, egrad: &DVector<f64>) -> DVector<f64>;
fn parallel_transport(&self, from: &DVector<f64>, to: &DVector<f64>, v: &DVector<f64>) -> DVector<f64>;
}
pub struct ProductManifold {
components: Vec<Box<dyn RiemannianOptimizable>>,
signature: Vec<usize>,
}Rust TDA crates exist but none match Ripser’s 40x speedup over competitors. The phlite crate is most promising, implementing Ripser’s R=DV decomposition and clearing optimization, but lacks apparent pairs and emergent pairs shortcuts that provide Ripser’s dramatic performance gains.
On 192 points on S² at dimension 2 (56M simplices):
| Implementation | Time | Memory |
|---|---|---|
| Dionysus (C++) | 533s | 3.4 GB |
| GUDHI (C++) | 75s | 2.9 GB |
| Ripser (C++) | 1.2s | 152 MB |
| phlite (Rust) | ~10-20s* | ~500 MB* |
*Estimated based on algorithm description; no published benchmarks
| Crate | Key Features | Gaps |
|---|---|---|
| phlite | VR filtration, R=DV decomposition, clearing | No apparent/emergent pairs |
| tda | Rips, Betti numbers, Mapper, Bottleneck distance | Limited optimization |
| lophat | Parallel computation (rayon) | Beta status, no VR-specific |
| teia | VR, lower-star complex | Basic implementation |
None implement Alpha complexes, Čech complexes, Wasserstein distance for persistence diagrams, or Z/pZ coefficient fields beyond Z/2Z.
Ripser compiles to WASM via Emscripten and runs in browsers at live.ripser.org. This proves TDA is computationally feasible in WASM. For Rust, phlite should compile to wasm32-unknown-unknown given its pure-Rust implementation with nalgebra.
TDA enables topological clustering for vector spaces—Betti numbers validate cluster counts, persistence diagrams reveal shape structure, and Mapper provides interpretable dimension reduction. Integration architecture:
Vector DB → Point Cloud → Distance Matrix → Rips Complex → Persistence
↓ ↓
Nearest Neighbors (ANN) Topological Features
↓
Betti Numbers | Persistence Diagrams
↓ ↓
Cluster Validation Shape Analysis
No dedicated information geometry library exists in Rust. This is a significant gap—Fisher-Rao metrics, natural gradient descent, and α-divergences are absent from the ecosystem. Building blocks exist but are fragmented across multiple crates.
| Component | Crate | Status |
|---|---|---|
| KL divergence | ndarray-stats | ✅ Production-ready |
| Cross-entropy | ndarray-stats | ✅ Production-ready |
| Shannon entropy | ndarray-stats | ✅ Production-ready |
| Automatic differentiation | burn, candle | ✅ Excellent |
| Second-order optimization | argmin (BFGS, L-BFGS) | ✅ Mature |
| Fisher information | — | ❌ Missing |
| Natural gradient | — | ❌ Missing |
| K-FAC | — | ❌ Missing |
| α-divergences | — | ❌ Missing |
Natural Gradient Descent (NGD) uses the Fisher information matrix to precondition gradients, providing curvature-aware updates. K-FAC (Kronecker-Factored Approximate Curvature) approximates Fisher information as Kronecker products, dramatically reducing computational cost from O(n²) to O(n).
Neither exists in Rust. Implementation requires:
- Fisher information computation via autodiff (burn provides Hessians)
- Natural gradient update: θ_{t+1} = θ_t - η F^{-1} ∇L(θ_t)
- K-FAC block-diagonal approximation for scalability
pub trait Divergence<T> {
fn divergence(&self, p: &T, q: &T) -> f64;
fn is_symmetric(&self) -> bool { false }
}
pub trait AlphaDivergence<T>: Divergence<T> {
fn alpha(&self) -> f64; // α=1 → KL(p||q), α=-1 → KL(q||p)
}
pub trait FisherRaoMetric<P> {
fn fisher_information(&self, p: &P) -> Array2<f64>;
fn geodesic_distance(&self, p: &P, q: &P) -> f64;
}
pub trait NaturalGradient<B: Backend> {
fn compute_fisher(&self, params: &Tensor<B, 1>) -> Tensor<B, 2>;
fn natural_step(&mut self, params: &Tensor<B, 1>, grad: &Tensor<B, 1>, lr: f64) -> Tensor<B, 1>;
}WASM deployment faces consistent challenges across all domains: BLAS/LAPACK dependencies block compilation, C++ FFI is incompatible, and memory is constrained to 4GB. The solution is pure-Rust implementations using WASM-compatible linear algebra.
| Crate | WASM Support | Use Case |
|---|---|---|
| nalgebra | ✅ Full | Statically-sized matrices, decompositions |
| ndarray (no blas feature) | ✅ Full | Dynamic arrays, no decompositions |
| linfa-linalg | ✅ Full | Pure-Rust LAPACK replacement |
| burn (wgpu backend) | ✅ Full | GPU via WebGPU |
The burn framework with wgpu backend enables GPU-accelerated tensor operations in browsers via WebGPU. This is viable for:
- Sinkhorn iterations (matrix-vector products)
- Hyperbolic distance computations (batch operations)
- Fisher information estimation (gradient outer products)
| Consideration | Impact | Mitigation |
|---|---|---|
| f64 vs f32 | Larger WASM, better precision | f64 for hyperbolic, f32 elsewhere |
| Threading | SharedArrayBuffer restrictions | Web Workers for parallelism |
| SIMD | WebAssembly SIMD now available | Use 128-bit SIMD intrinsics |
| Memory growth | Large filtrations may exceed limits | Streaming/chunked APIs |
┌─────────────────────────────────────────────────────────────────┐
│ SIMILARITY CONTEXT │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Euclidean │ │ Hyperbolic │ │ Optimal Transport │ │
│ │ Distance │ │ Distance │ │ (Wasserstein/Sinkhorn) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ TOPOLOGY CONTEXT │
│ ┌─────────────────────┐ ┌────────────────────────────────┐ │
│ │ Persistent Homology │ │ Topological Clustering (Mapper)│ │
│ └─────────────────────┘ └────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ GEOMETRY CONTEXT │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Poincaré │ │ Lorentz │ │ Product Manifold │ │
│ │ Ball Model │ │ Model │ │ (H^n × S^m × E^p) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ OPTIMIZATION CONTEXT │
│ ┌─────────────────────┐ ┌────────────────────────────────┐ │
│ │ Natural Gradient │ │ Riemannian Optimization │ │
│ └─────────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Algorithm | Priority | Complexity | Dependencies | Timeline |
|---|---|---|---|---|
| Sliced Wasserstein | 🔴 Critical | Medium | nalgebra | 2-3 weeks |
| Log-stabilized Sinkhorn | 🔴 Critical | Low | nalgebra | 1 week |
| Product Manifold | 🔴 Critical | Medium | ruvector-core | 3-4 weeks |
| Apparent pairs (TDA) | 🟡 High | High | phlite fork | 4-6 weeks |
| Natural Gradient | 🟡 High | Medium | burn | 3-4 weeks |
| Gromov-Wasserstein | 🟢 Medium | High | nalgebra | 6-8 weeks |
| K-FAC optimizer | 🟢 Medium | High | burn | 6-8 weeks |
ruvector-math/
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── manifold/
│ │ ├── mod.rs
│ │ ├── poincare.rs # From ruvector-core
│ │ ├── lorentz.rs # From ruvector-core
│ │ ├── spherical.rs # New implementation
│ │ └── product.rs # New: H^n × S^m × E^p
│ ├── transport/
│ │ ├── mod.rs
│ │ ├── sinkhorn.rs # Log-stabilized, greedy
│ │ ├── sliced.rs # Sliced Wasserstein
│ │ └── gromov.rs # Gromov-Wasserstein
│ ├── topology/
│ │ ├── mod.rs
│ │ ├── rips.rs # VR complex construction
│ │ ├── persistence.rs # R=DV with apparent pairs
│ │ └── mapper.rs # Topological clustering
│ └── info_geom/
│ ├── mod.rs
│ ├── divergence.rs # KL, α-divergences
│ ├── fisher.rs # Fisher information
│ └── natural_grad.rs # NGD optimizer
The Rust ecosystem provides strong foundations (nalgebra, burn, ndarray) but lacks algorithm-specific implementations for advanced mathematical methods. The recommended strategy:
- Extend ruvector for hyperbolic/manifold operations—it’s the most mature implementation
- Fork phlite and implement apparent/emergent pairs for competitive TDA performance
- Build new optimal transport crate using nalgebra (pure Rust, WASM-compatible)
- Create information geometry crate leveraging burn’s autodiff for Fisher computation
For WASM deployment, avoid any crate with BLAS/LAPACK or C++ FFI dependencies. The pure-Rust path (nalgebra + burn + custom algorithms) enables browser deployment with WebGPU acceleration via wgpu.
The 40x performance gap between Rust TDA and Ripser, and the complete absence of natural gradient optimizers, represent the most significant technical debt. Addressing these gaps positions ruvector as the first production-grade vector database with native support for non-Euclidean geometry and topological analysis.