Building Persistent Memory for AI Agents with ChromaDB and OpenClaw
February 5, 2026
One of the fundamental limitations of LLM-based assistants is their lack of persistent memory. Each conversation starts fresh—your AI has no recollection of previous interactions, decisions, or lessons learned. This article explores how to solve this problem by implementing a semantic vector memory system using ChromaDB, specifically designed for AI agents running on platforms like OpenClaw.
The Memory Problem
Large Language Models like Claude or GPT have context windows—a limited amount of text they can process at once. While these windows have grown significantly (200K+ tokens for Claude), they're still fundamentally session-based. Your AI assistant can't:
- Remember what you discussed last week
- Learn from corrections you've made
- Build up knowledge about your preferences over time
- Recall decisions and their reasoning
Traditional solutions involve dumping conversation logs into the context, but this is inefficient and doesn't scale. What we need is semantic memory—the ability to store and retrieve information based on meaning, not just keywords.
Why Vector Memory?
Vector databases store text as high-dimensional embeddings—numerical representations that capture semantic meaning. When you search for "my coffee preferences," it can find entries about "I like dark roast from Ethiopia" even though the words don't match.
This is fundamentally different from traditional databases:
| Traditional DB | Vector DB |
|---|---|
| Exact keyword matching | Semantic similarity |
| Rigid schema | Flexible documents |
| SQL queries | Natural language queries |
| Returns exact matches | Returns related concepts |
For AI agents, vector memory enables:
- Semantic recall: Find relevant memories by meaning
- Efficient storage: Only store what matters, retrieve what's relevant
- Learning over time: Accumulate knowledge without bloating context
- Correction memory: Remember mistakes to avoid repeating them
The Architecture
Our solution uses ChromaDB, an open-source embedding database that runs locally without external API calls. The architecture is simple:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ AI Agent │────▶│ Memory CLI │────▶│ ChromaDB │
│ (OpenClaw) │ │ (Python) │ │ (Local) │
└─────────────┘ └──────────────┘ └─────────────┘
│
┌──────┴──────┐
│ Memory Types │
├─────────────┤
│ • Learnings │
│ • Decisions │
│ • Corrections│
│ • Files │
│ • Notes │
└─────────────┘
The AI agent interacts with memory through a CLI wrapper, which handles embedding generation, storage, and retrieval. All data stays local—no external API calls for embeddings.
Implementation
Step 1: Set Up the Directory Structure
mkdir -p ~/.openclaw/vector-memory
mkdir -p ~/.openclaw/bin
Step 2: Create the Memory Service
Create ~/.openclaw/vector-memory/memory_service.py:
#!/usr/bin/env python3
"""
Long-Term Memory Service
Semantic memory storage and retrieval using ChromaDB
"""
import chromadb
from chromadb.config import Settings
import json
import hashlib
from datetime import datetime
from pathlib import Path
import argparse
# Configuration
MEMORY_DIR = Path.home() / ".openclaw" / "vector-memory"
CHROMA_DIR = MEMORY_DIR / "chroma_db"
COLLECTION_NAME = "agent_memory"
# Initialize ChromaDB client (local, no external APIs)
client = chromadb.PersistentClient(
path=str(CHROMA_DIR),
settings=Settings(anonymized_telemetry=False)
)
collection = client.get_or_create_collection(
name=COLLECTION_NAME,
metadata={"description": "Agent long-term semantic memory"}
)
def generate_id(text: str, source: str = "") -> str:
"""Generate a unique ID for a memory chunk."""
content = f"{text}{source}{datetime.now().isoformat()}"
return hashlib.sha256(content.encode()).hexdigest()[:16]
def add_memory(text: str, source: str = "manual",
memory_type: str = "note", metadata: dict = None) -> str:
"""Add a new memory to the store."""
if not text.strip():
return "Error: Empty text"
doc_id = generate_id(text, source)
meta = {
"source": source,
"type": memory_type,
"timestamp": datetime.now().isoformat(),
"date": datetime.now().strftime("%Y-%m-%d"),
}
if metadata:
meta.update(metadata)
collection.add(
documents=[text],
ids=[doc_id],
metadatas=[meta]
)
return f"Memory added: {doc_id}"
def search_memory(query: str, top_k: int = 5,
memory_type: str = None) -> list:
"""Search memories semantically."""
where_filter = {"type": memory_type} if memory_type else None
results = collection.query(
query_texts=[query],
n_results=top_k,
where=where_filter
)
memories = []
if results and results['documents'] and results['documents'][0]:
for i, doc in enumerate(results['documents'][0]):
memories.append({
"text": doc,
"metadata": results['metadatas'][0][i],
"distance": results['distances'][0][i],
"id": results['ids'][0][i]
})
return memories
def add_learning(text: str, context: str = "") -> str:
"""Add a learning/insight to memory."""
full_text = f"Context: {context}\n\nLearning: {text}" if context else text
return add_memory(full_text, "learning", "learning", {"context": context})
def add_decision(decision: str, reasoning: str = "") -> str:
"""Record a decision and its reasoning."""
full_text = f"Decision: {decision}"
if reasoning:
full_text += f"\n\nReasoning: {reasoning}"
return add_memory(full_text, "decision", "decision", {"reasoning": reasoning})
def add_correction(mistake: str, correction: str, context: str = "") -> str:
"""Record a correction for learning from mistakes."""
full_text = f"Mistake: {mistake}\n\nCorrection: {correction}"
if context:
full_text = f"Context: {context}\n\n{full_text}"
return add_memory(full_text, "correction", "correction", {"context": context})
def index_markdown_file(filepath: str, chunk_size: int = 500) -> int:
"""Index a markdown file, splitting into semantic chunks."""
path = Path(filepath)
if not path.exists():
return 0
content = path.read_text()
chunks = []
current_chunk = ""
for line in content.split('\n'):
if line.startswith('#') and current_chunk:
if len(current_chunk.strip()) > 50:
chunks.append(current_chunk.strip())
current_chunk = line + "\n"
else:
current_chunk += line + "\n"
if len(current_chunk) > chunk_size:
chunks.append(current_chunk.strip())
current_chunk = ""
if current_chunk.strip() and len(current_chunk.strip()) > 50:
chunks.append(current_chunk.strip())
indexed = 0
for i, chunk in enumerate(chunks):
doc_id = f"{path.name}_{i}_{hashlib.md5(chunk.encode()).hexdigest()[:8]}"
existing = collection.get(ids=[doc_id])
if existing and existing['ids']:
continue
collection.add(
documents=[chunk],
ids=[doc_id],
metadatas=[{
"source": str(filepath),
"type": "file",
"filename": path.name,
"chunk_index": i,
"timestamp": datetime.now().isoformat(),
}]
)
indexed += 1
return indexed
def get_stats() -> dict:
"""Get memory statistics."""
return {
"total_memories": collection.count(),
"collection": COLLECTION_NAME,
"storage_path": str(CHROMA_DIR)
}
Step 3: Create the CLI Wrapper
Create ~/.openclaw/bin/memory:
#!/bin/bash
python3 ~/.openclaw/vector-memory/memory_service.py "$@"
Make it executable:
chmod +x ~/.openclaw/bin/memory
Step 4: Set Up the Python Environment
cd ~/.openclaw/vector-memory
python3 -m venv venv
source venv/bin/activate
pip install chromadb
Update the CLI to use the virtual environment:
#!/bin/bash
~/.openclaw/vector-memory/venv/bin/python3 \
~/.openclaw/vector-memory/memory_service.py "$@"
Using the Memory System
Basic Commands
# Search memories semantically
~/.openclaw/bin/memory search "user preferences for coffee" -n 5
# Add a learning
~/.openclaw/bin/memory learn "User prefers dark roast coffee" \
-c "Mentioned during morning routine discussion"
# Record a decision
~/.openclaw/bin/memory decide "Use PostgreSQL for the project" \
-r "Better JSON support and team familiarity"
# Record a correction (when you correct the AI)
~/.openclaw/bin/memory correct \
"Assumed user was in EST timezone" \
"User is in CET (Europe/Rome)" \
-c "Scheduling meeting"
# Index a markdown file
~/.openclaw/bin/memory index-file ~/notes/project-requirements.md
# Check stats
~/.openclaw/bin/memory stats
Integrating with Your AI Agent
Add instructions to your agent's configuration to use memory:
## Memory System
Before answering questions about past work, preferences, or decisions:
\`\`\`bash
~/.openclaw/bin/memory search "relevant query" -n 5
\`\`\`
When learning something important:
\`\`\`bash
~/.openclaw/bin/memory learn "what I learned" -c "context"
\`\`\`
When corrected by user:
\`\`\`bash
~/.openclaw/bin/memory correct "mistake" "correction" -c "context"
\`\`\`
The Memory Workflow
Having the system is one thing—using it consistently is another. After running this in production, here's the workflow that actually sticks:
At Session Start
- Read context files: MEMORY.md and recent daily logs give immediate context
- Search before answering: If the user asks about past work, run
memory searchfirst:
# User asks: "What did we decide about the database?"
~/.openclaw/bin/memory search "database decision" -n 5
This prevents the AI from making things up or forgetting prior decisions.
During Work
| Trigger | Action |
|---|---|
| Completed a significant feature | memory learn "Built auth system with JWT" -c "Project X" |
| Made a non-obvious decision | memory decide "Using Redis for sessions" -r "Need sub-ms latency" |
| User corrected a mistake | memory correct "Assumed UTC" "User is in CET" -c "scheduling" |
| Created important documentation | memory index-file ./docs/architecture.md |
At Session End
- Update the daily log (
memory/YYYY-MM-DD.md) with key events - Consider what's worth adding to long-term MEMORY.md
- Index any new important files
The Key Principle
If you might need to remember it next session, write it down NOW.
Mental notes don't survive context window resets. Files and vector memories do.
Memory Types and Their Uses
| Type | Purpose | Example |
|---|---|---|
learning | Insights and lessons | "User prefers concise responses" |
decision | Recorded choices with reasoning | "Chose React over Vue for better ecosystem" |
correction | Mistakes and their fixes | "Don't assume timezone; always ask" |
file | Indexed document chunks | Project docs, meeting notes |
note | General information | Contact details, preferences |
Why ChromaDB?
Several vector databases could work for this use case, but ChromaDB offers specific advantages for local AI agents:
- Local-first: Runs entirely on your machine, no API keys or external calls
- Built-in embeddings: Uses Sentence Transformers by default—no OpenAI API needed
- Lightweight: Single Python package, minimal dependencies
- Persistent: Data survives restarts without explicit save calls
- Simple API: Add, query, and manage with minimal code
Results in Practice
After implementing this system, the AI agent can:
- Recall context from weeks ago: "What did we decide about the database schema?"
- Learn from corrections: Same mistake doesn't repeat after being corrected once
- Build institutional knowledge: Preferences, patterns, and decisions accumulate
- Search semantically: Find related memories even with different phrasing
The difference is transformative. The assistant goes from a stateless tool to something that feels like it actually knows you.
Conclusion
Persistent memory transforms AI assistants from stateless responders into genuine collaborators that learn and grow. With ChromaDB and a simple CLI wrapper, you can add this capability to any LLM-based agent in under an hour.
The key insight is that memory doesn't need to be complex. A vector database, some careful categorization of memory types, and clear instructions for when to store and retrieve—that's all it takes to give your AI assistant a functioning long-term memory.