Skip to content

Benchmarks

Last updated: 2026-03-03 23:02

Generated by benchmarks/run_benchmarks.py.

Activation Engine

Compares three activation modes on synthetic graphs with overlapping fiber pathways:

  • Classic: BFS spreading activation with distance-based decay

  • Reflex: Trail-based activation through fiber pathways only

  • Hybrid: Reflex primary + limited classic BFS for discovery (default in v0.6.0+)

Neurons Fibers Classic (ms) Reflex (ms) Hybrid (ms) Classic # Reflex # Hybrid # Reflex Recall Hybrid Recall
100 10 2.35 0.03 0.75 85 16 66 16.5% 75.3%
500 50 6.13 0.05 0.91 231 38 155 8.7% 59.3%
1000 100 4.57 0.03 0.74 190 29 126 3.7% 54.7%
3000 300 7.81 0.06 0.88 242 52 166 2.5% 49.6%
5000 500 4.53 0.13 0.67 171 151 232 3.5% 50.9%

Speedup

Graph Size Classic vs Hybrid Classic vs Reflex
100 3.1x 78.3x
500 6.7x 122.6x
1000 6.2x 152.3x
3000 8.9x 130.2x
5000 6.8x 34.8x

Average recall -- Reflex only: 7.0% | Hybrid: 58.0%

Full Pipeline

End-to-end benchmark: 15 encoded memories, 5 queries, 10 runs each.

Query Depth Classic (ms) Hybrid (ms) Speedup C-Neurons H-Neurons C-Conf H-Conf
What did Alice suggest? INSTANT 1.3 5.09 0.3x 16 13 1.0 1.0
What was the auth bug fix? INSTANT 1.05 2.95 0.4x 15 12 1.0 1.0
What happened on Thursday? CONTEXT 1.33 1.7 0.8x 8 8 1.0 1.0
Why did we choose PostgreSQL? DEEP 2.24 3.18 0.7x 10 10 1.0 1.0
What is Bob working on? CONTEXT 2.1 3.45 0.6x 10 10 1.0 1.0
Total 8.02 16.37 0.5x

Ground-Truth Evaluation

30 curated memories, 25 queries, K=5.

Overall (NeuralMemory vs Naive Baseline)

Metric NeuralMemory Naive Baseline Winner
Precision@5 0.168 0.248 Baseline
Recall@5 0.380 0.466 Baseline
MRR 0.563 0.637 Baseline
NDCG@5 0.350 0.464 Baseline

Per-Category Recall

Category NeuralMemory Baseline Count
causal 0.375 0.500 4
coherence 0.244 0.378 3
factual 0.556 0.819 8
pattern 0.237 0.304 4
temporal 0.312 0.125 6

Methodology

  • Platform: InMemoryStorage (NetworkX), single-threaded async
  • Runs: 10 per measurement (median reported)
  • Warmup: 1 warmup run excluded from timing
  • Hybrid strategy: Reflex trail activation (primary) + classic BFS with max_hops // 2 (discovery, dampened 0.6x)
  • Seed: random.seed(42) for reproducibility

Regenerate

python benchmarks/run_benchmarks.py

Results are written to docs/benchmarks.md.

SQLite at Scale

Last updated: 2026-03-04 02:24

Real SQLiteStorage benchmarks with diverse memory types on Windows 11.

Encode Throughput

Memories Total (s) Mean (ms) Median (ms) P95 (ms) P99 (ms) Throughput (mem/s) Errors
1,000 26.5 26.52 22.59 51.33 66.75 37.7 0
5,000 190.8 38.16 34.65 76.36 99.64 26.2 0
10,000 536.1 53.61 47.9 102.27 131.85 18.7 0
50,000 10954.6 219.09 191.25 509.01 656.49 4.6 0

Database Size

Memories After Encode (MB) After Consolidation (MB) Neurons Synapses Fibers
1,000 11.2 12.6 3,534 7,784 1,000
5,000 46.55 48.67 13,734 34,238 5,000
10,000 88.29 93.07 25,033 65,789 10,000
50,000 411.48 419.0 108,913 311,777 50,000

Recall Latency (Post-Consolidation)

10 queries, 5 runs each (median reported).

1,000 memories

Query Depth Median (ms) P95 (ms) Neurons Confidence Found
Python concurrency INSTANT 145.12 154.46 15 1.0 yes
What database did we choose? CONTEXT 2.08 2.36 0 0.0 no
connection error Redis INSTANT 109.15 121.76 21 1.0 yes
deployment workflow CONTEXT 112.8 136.68 23 1.0 yes
Why did we choose PostgreSQL? DEEP 38.02 40.82 7 1.0 yes
authentication JWT INSTANT 70.4 94.68 15 1.0 yes
What patterns were discovered? CONTEXT 18.83 21.54 8 1.0 yes
machine learning integration DEEP 132.76 164.06 20 1.0 yes
rate limiting implementation INSTANT 125.62 153.83 24 1.0 yes
TODO before release CONTEXT 137.43 181.73 15 1.0 yes
Average 89.22 107.19 14.8

5,000 memories

Query Depth Median (ms) P95 (ms) Neurons Confidence Found
Python concurrency INSTANT 117.43 160.7 19 1.0 yes
What database did we choose? CONTEXT 1.73 2.15 0 0.0 no
connection error Redis INSTANT 169.82 170.16 23 1.0 yes
deployment workflow CONTEXT 169.55 198.36 23 1.0 yes
Why did we choose PostgreSQL? DEEP 77.99 106.03 7 1.0 yes
authentication JWT INSTANT 109.31 191.21 19 1.0 yes
What patterns were discovered? CONTEXT 43.49 50.35 8 1.0 yes
machine learning integration DEEP 83.03 124.42 22 1.0 yes
rate limiting implementation INSTANT 126.62 166.48 26 1.0 yes
TODO before release CONTEXT 199.36 211.71 19 1.0 yes
Average 109.83 138.16 16.6

10,000 memories

Query Depth Median (ms) P95 (ms) Neurons Confidence Found
Python concurrency INSTANT 96.55 144.66 21 1.0 yes
What database did we choose? CONTEXT 1.99 2.35 0 0.0 no
connection error Redis INSTANT 156.88 174.88 26 1.0 yes
deployment workflow CONTEXT 169.16 209.46 22 1.0 yes
Why did we choose PostgreSQL? DEEP 75.2 89.14 7 1.0 yes
authentication JWT INSTANT 116.5 143.92 19 1.0 yes
What patterns were discovered? CONTEXT 49.92 58.23 8 1.0 yes
machine learning integration DEEP 91.53 126.03 21 1.0 yes
rate limiting implementation INSTANT 162.43 168.47 27 1.0 yes
TODO before release CONTEXT 217.67 237.86 19 1.0 yes
Average 113.78 135.5 17

50,000 memories

Query Depth Median (ms) P95 (ms) Neurons Confidence Found
Python concurrency INSTANT 190.35 207.35 21 1.0 yes
What database did we choose? CONTEXT 2.36 3.44 0 0.0 no
connection error Redis INSTANT 224.34 252.6 26 1.0 yes
deployment workflow CONTEXT 207.73 235.62 23 1.0 yes
Why did we choose PostgreSQL? DEEP 172.12 211.13 10 1.0 yes
authentication JWT INSTANT 183.04 213.52 19 1.0 yes
What patterns were discovered? CONTEXT 118.83 147.36 8 1.0 yes
machine learning integration DEEP 168.41 174.37 21 1.0 yes
rate limiting implementation INSTANT 227.81 286.0 27 1.0 yes
TODO before release CONTEXT 297.5 331.74 19 1.0 yes
Average 179.25 206.31 17.4

Consolidation Performance

Memories Duration (s) Synapses Pruned Neurons Pruned Fibers Merged Synapses Enriched
1,000 2.4 0 0 0 3
5,000 3.8 0 0 0 6
10,000 7.8 0 0 0 5
50,000 8.9 0 0 0 4

Health Diagnostics

Memories Phase Grade Purity Connectivity Diversity Freshness Orphan Rate Warnings Diagnostics (ms)
1,000 Pre D 42.9 0.232 0.493 1.0 0.0 2 329.8
1,000 Post D 44.6 0.268 0.531 1.0 0.0 2 261.5
5,000 Pre F 36.6 0.319 0.409 1.0 0.672 3 449.7
5,000 Post F 38.4 0.354 0.455 1.0 0.674 3 404.9
10,000 Pre F 35.5 0.364 0.373 1.0 0.82 3 487.0
10,000 Post F 38.4 0.434 0.437 1.0 0.821 3 488.4
50,000 Pre F 34.8 0.449 0.305 1.0 0.959 3 650.9
50,000 Post F 36.3 0.479 0.346 1.0 0.959 3 629.4

Methodology

  • Storage: Real SQLiteStorage (aiosqlite, WAL mode)
  • Platform: Windows 11, single-threaded async
  • Memory types: 7 types (fact, decision, error, insight, todo, workflow, context)
  • Content: Diverse generated content from 50 topics × 16 actions × 26 features
  • Recall runs: 5 per query (median reported)
  • Seed: random.seed(42) for reproducibility