Embedding Setup Guide¶

NeuralMemory works without embeddings — its core retrieval uses spreading activation on a neural graph. Embeddings are an optional enhancement that adds cross-language recall and semantic discovery between unrelated memories.

When Do You Need Embeddings?¶

Scenario	Without Embeddings	With Embeddings
Recall "auth bug" when you stored "auth bug"	Works perfectly	Works perfectly
Recall "lỗi xác thực" when you stored "auth bug"	Won't match	Matches via semantic similarity
Discover links between "JWT expired" and "token refresh"	Only if explicitly connected	Auto-discovered via cosine similarity
Store/recall in one language only	Full functionality	No benefit

Rule of thumb: If you work in a single language and your recall queries use similar wording to what you stored, you don't need embeddings.

Quick Start: Auto-Detection¶

The easiest way to enable embeddings — let NeuralMemory detect what's available:

# ~/.neuralmemory/config.toml
[embedding]
enabled = true
provider = "auto"

Auto-detection checks (in order):

Ollama running locally → uses bge-m3 or best available model
sentence-transformers installed → uses paraphrase-multilingual-MiniLM-L12-v2
GEMINI_API_KEY set → uses Google's free-tier embedding API
OPENAI_API_KEY set → uses OpenAI's embedding API

If none are available, embedding stays disabled and recall falls back to graph-only (which works great for single-language use).

Providers¶

1. Sentence Transformer (Free, Local)¶

Runs entirely on your machine. No API key, no cost, no data leaves your device.

pip install neural-memory[embeddings]

Config (~/.neuralmemory/config.toml):

[embedding]
enabled = true
provider = "sentence_transformer"
model = "all-MiniLM-L6-v2"           # English-only, 384D, ~80MB
similarity_threshold = 0.7

Multilingual models (recommended for non-English or mixed-language use):

Model	Languages	Dimensions	Size	Speed (CPU)
`all-MiniLM-L6-v2`	English only	384	~80MB	~15ms
`paraphrase-multilingual-MiniLM-L12-v2`	50+ languages	384	~440MB	~25ms
`multilingual-e5-small`	100+ languages	384	~500MB	~30ms
`multilingual-e5-large`	100+ languages	1024	~2.2GB	~150ms

For Vietnamese, Chinese, Japanese, or any non-English language, use paraphrase-multilingual-MiniLM-L12-v2:

[embedding]
enabled = true
provider = "sentence_transformer"
model = "paraphrase-multilingual-MiniLM-L12-v2"
similarity_threshold = 0.65

The model downloads automatically on first use (~440MB). Subsequent runs use the cached version.

2. Gemini (Google API)¶

Uses Google's gemini-embedding-001 (3072D) or text-embedding-004 (768D).

pip install neural-memory[embeddings-gemini]

Set your API key:

export GEMINI_API_KEY="your-key-here"

Config:

[embedding]
enabled = true
provider = "gemini"
model = "text-embedding-004"         # 768D, lower cost
# model = "gemini-embedding-001"     # 3072D, higher quality
similarity_threshold = 0.7

Get a free API key at ai.google.dev. Free tier includes generous embedding quotas.

3. Ollama (Free, Local)¶

Uses any Ollama model for embeddings. Runs entirely on your machine via the Ollama API.

pip install neural-memory[embeddings]
# Ensure Ollama is running: ollama serve
# Pull an embedding model: ollama pull nomic-embed-text

Config:

[embedding]
enabled = true
provider = "ollama"
model = "nomic-embed-text"              # 768D, fast
# model = "mxbai-embed-large"           # 1024D, higher quality
similarity_threshold = 0.7
# base_url = "http://localhost:11434"    # Default Ollama URL

Requires Ollama running locally. See ollama.com for installation.

4. OpenAI (API)¶

Uses OpenAI's text-embedding-3-small or text-embedding-3-large.

pip install neural-memory[embeddings-openai]

Set your API key:

export OPENAI_API_KEY="your-key-here"

Config:

[embedding]
enabled = true
provider = "openai"
model = "text-embedding-3-small"     # 1536D, $0.02/1M tokens
# model = "text-embedding-3-large"   # 3072D, $0.13/1M tokens
similarity_threshold = 0.7

Provider Comparison¶

	Sentence Transformer	Ollama	Gemini	OpenAI
Cost	Free	Free	Free tier / pay-per-use	Pay-per-use
Privacy	100% local	100% local	Data sent to Google	Data sent to OpenAI
Speed	15-150ms (CPU)	10-50ms (GPU)	~200ms (network)	~200ms (network)
Quality	Good	Good-Excellent	Excellent	Excellent
Multilingual	With multilingual model	Model-dependent	Built-in	Built-in
Offline	Yes	Yes	No	No
Setup	`pip install` only	Ollama + model pull	API key required	API key required
GPU Accel	Optional	Yes (native)	N/A	N/A

Recommendation: Start with sentence_transformer + paraphrase-multilingual-MiniLM-L12-v2 for simplicity. Use ollama if you have a GPU and want fast local inference. Switch to Gemini or OpenAI only if you need higher quality for production workloads.

How It Works¶

When embeddings are enabled, two things happen:

1. Embedding Anchors (during recall)¶

When you recall a memory, NeuralMemory runs keyword search (FTS5) and embedding similarity search in parallel. Results are merged — this means you can find memories even when the query uses completely different words than what was stored.

2. Semantic Discovery (during consolidation)¶

When you run nmem consolidate, NeuralMemory: 1. Embeds all CONCEPT and ENTITY neurons 2. Computes pairwise cosine similarity 3. Creates SIMILAR_TO synapses between semantically related neurons above the threshold

These synapses allow spreading activation to traverse semantic connections during future recalls.

Tuning¶

Similarity Threshold¶

0.7 (default): Conservative — only strong matches. Good for precision.
0.6: Moderate — catches more cross-language matches. Good starting point for multilingual.
0.5: Aggressive — more noise but catches loose associations.

Changing Provider Mid-Session¶

You can change providers at any time. However, existing SIMILAR_TO synapses created by semantic discovery were computed with the old provider's embeddings. Run nmem consolidate after switching to recompute with the new provider.

Warning: Different providers produce different embedding dimensions. Stored _embedding metadata from the old provider will be incompatible. This only affects semantic discovery — core recall is unaffected.

Troubleshooting¶

"Embedding provider unavailable — skipping": The required package isn't installed. Check pip list | grep sentence-transformers (or google-genai / openai).

Slow first recall: Sentence Transformer downloads the model on first use. Subsequent runs use cache at ~/.cache/huggingface/.

No cross-language matches: Check that you're using a multilingual model. all-MiniLM-L6-v2 is English-only.

API key errors: Ensure the environment variable is set in the shell that runs the MCP server. For Claude Code, add it to your MCP config:

{
  "mcpServers": {
    "neural-memory": {
      "command": "nmem-mcp",
      "env": {
        "GEMINI_API_KEY": "your-key"
      }
    }
  }
}