Your agent forgets past decisions and burns tokens re-reading the same context. Memory Stack runs 5 search engines plus built-in code search — all locally — returns only what matters, and never loses a fact. Works across all your tools, cleans up after itself, and updates automatically.
Every wasted token is money burned. Memory Stack eliminates the waste.
Native memory dumps full text every time. Memory Stack gives you three tiers: compact auto-recall at ~100 tokens, summaries at ~800, full content on demand. 90% fewer tokens per search — your agent gets exactly what it needs, nothing more.
One search fires 5 engines in parallel — full-text, semantic, markdown, fact store, and compressed history. Results merge with rank fusion and diversity reranking. Right answer on the first try, no wasted tokens chasing wrong context.
Find function names, variable names, or any pattern across your entire memory — instantly. Memory Stack has its own search engine that narrows thousands of entries down to a handful of matches before even opening a file. No extra tools needed, works offline, gets faster the more you use it.
Long conversations get compressed and old messages disappear. Memory Stack captures 8 types of facts — decisions, deadlines, requirements, preferences, and more — with full context attached. When you say "We will NOT use MongoDB", it remembers the negation correctly. Your agent recalls key facts instantly instead of you re-explaining.
Memory Stack automatically tracks entities and their relationships — who changed what, what depends on what, how things evolved over time. Queryable on demand, not buried in old conversations.
Duplicates and outdated info cost real money every time your agent reads them. Memory Stack catches duplicates at 3 levels before they're even saved, and when a fact changes — say "use PostgreSQL" becomes "use MySQL" — the old version is automatically archived and the new one takes over. Health score tells you exactly what's wasting tokens.
Memory Stack isn't locked to one editor. Share memory between Claude Code, Cursor, Windsurf, or any MCP-compatible tool. Drop a markdown file into the memory folder and it's instantly searchable. Use the command line to query or add facts from scripts, pipelines, or other agents.
Install once and forget about it. Memory Stack sets up its own search index, rebuilds it if anything goes wrong, cleans out stale entries after 90 days, and updates itself in the background. If part of the system isn't available, it keeps working with what it has. Add any LLM API key to unlock smarter fact extraction — your key, your choice of provider.
One curl command installs, registers, and restarts OpenClaw. Updates happen automatically in the background.
Same conversations. One remembers everything at 10% of the token cost.
| Native Memory | Memory Stack | |
|---|---|---|
| Remembers longer | ||
| What happens when the conversation gets too long? | Old messages get compressed. Decisions disappear. | Key facts are saved before compression and brought back automatically. |
| Can it remember things from last week? | Only if it's still in the conversation window. | Yes. Recent memories rank higher, but older ones are still searchable. |
| Does it understand how things connect? | No. It searches text, not relationships. | Yes. Entity tracking links people, tools, and decisions — queryable on demand. |
| Can it trace how a decision evolved? | No. | Yes. Evolution chains link past decisions to current ones. |
| Saves you money | ||
| How much does each memory search cost? | Loads full text every time. More tokens, higher API bill. | Loads a summary first. Only fetches full text when needed. Uses up to 90% fewer tokens. |
| Does it waste money on duplicate results? | Can feed the same info to your AI twice. You pay for both. | Removes duplicates before sending anything to the AI. You only pay once. |
| Does the cost grow over time? | Memory piles up. More junk = more tokens = higher cost. | Auto-cleanup merges similar memories. Stays lean, cost stays flat. |
| Finds things faster | ||
| How many search methods run per query? | 2 (keyword + vector) | 5 engines, merged with rank fusion and per-engine weights. |
| Does it understand what you meant, not just what you typed? | Basic keyword matching. | Query expansion rewrites your question locally before searching — no API call needed. |
| Can it search across past conversations? | Limited. | Dedicated fact store and entity tracking — finds facts across all conversations instantly. |
| Can you check if your memory is healthy? | No. | Quality score 0-100. Shows duplicates, stale entries, noise. |
| How much context does each recall use? | Full text every time. No control over token usage. | Tiered output: L0 auto-recall uses ~100 tokens. L1 summaries ~800. Full text only on demand. |
| Does it need API keys or cloud services? | Vector search needs an embedding provider. | Core search runs offline. Bring your own LLM key (OpenAI, Anthropic, Ollama, MLX — any provider) to unlock structured fact extraction from every conversation. Full experience with your key. |
| Works everywhere, maintains itself | ||
| Can other tools access the same memory? | Locked to one client. No external access. | CLI commands let any tool read/write facts. Drop a file into the memory folder — instantly searchable from Claude Code, Cursor, Windsurf, or any MCP client. |
| Does memory follow you across projects? | Separate memory per workspace. | Unified global memory under one directory. Your facts, entities, and history follow you everywhere. |
| What happens when something breaks? | Manual troubleshooting. | Self-healing. Rebuilds its own index if corrupted, archives stale entries after 90 days, and falls back to keyword search if vector isn't available. Zero maintenance. |
| When a fact changes, does the old version disappear? | Old and new versions pile up. You pay tokens for both. | Automatic supersede — old fact archived, new one takes over. Full audit trail, zero token waste. |
Most memory skills do one thing. You end up installing 3-4 and hoping they work together.
| What you need | Other skills | Memory Stack |
|---|---|---|
| Find a function name | Vector search misses exact names | Full-text keyword search finds it instantly |
| Find "how does auth work" | Vector search works | Semantic search with query expansion |
| Search across 5 conversations | Limited to current context | Fact store + entity tracking |
| Control token spend | Full text every time | 3 tiers: ~100 / ~800 / full |
| Remove duplicates | Manual cleanup | 4-level auto-dedup |
| Track decision evolution | No history | Evolution tracking across conversations |
| Check memory quality | No tooling | Health score 0-100 |
| Work offline | Needs OpenAI key | Core search runs offline |
Hermes agents and Google Always-On both store memory. Neither gives you control over how.
| Hermes Agent | Google Always-On | Memory Stack | |
|---|---|---|---|
| Control & Transparency | |||
| How is memory triggered? | Agent decides via explicit tool call | Automatic — Google decides what to save | Agent calls router — 13 deterministic rules, debuggable |
| Can you inspect routing decisions? | No — agent reasoning is opaque | No — black box | Yes — rule table is plain text, every decision is logged |
| What happens when recall quality is poor? | No fallback — one backend, one shot | Google retries internally, no user control | Sequential fallback chain — relevance < 0.4 triggers next backend automatically |
| Privacy & Data | |||
| Where is memory stored? | Developer-managed external DB | Google cloud — you cannot opt out | Your machine — git branch + local SQLite. Nothing leaves your device |
| Who owns the data? | You (if you host the DB) | You — MIT licensed, fully local | |
| Works offline? | Depends on your DB setup | Requires Google account and connection | Core search runs fully offline. No API key required |
| Token Cost & Efficiency | |||
| Token budget control? | None — returns full content | None — Google injects what it wants | 3 tiers: L0 ~100 tokens / L1 ~800 / L2 full — on demand |
| Deduplication before recall? | No — developer responsibility | Unknown — no visibility | 4-level auto-dedup — you never pay twice for the same fact |
| Setup & Maintenance | |||
| Time to get running? | Hours — wire up DB, tool schemas, prompts | Minutes — but only works inside Google ecosystem | One curl command — registered and running in under 5 minutes |
| Works with your existing tools? | Whatever you build | Gemini only | Claude Code, Cursor, Windsurf, any MCP client |
| Open source? | Model weights open, memory tooling varies | No | MIT — full source on GitHub |
A drop-in OpenClaw plugin that replaces built-in memory. 5 search engines with rank fusion, entity tracking, and 3-tier token control. Your agent recalls more while using up to 90% fewer tokens. Core search and memory run locally — add your own LLM key for enhanced fact extraction.
5 engines fire in parallel with automatic fallback. Results merge with rank fusion and diversity reranking. Entities and relationships are tracked automatically and queryable on demand. Tiered output controls exactly how many tokens each recall costs. Add your own LLM key (OpenAI, Anthropic, Ollama, MLX, or any compatible provider) for structured fact extraction — the complete Memory Stack experience.
Yes. Memory Stack plugs into OpenClaw as a native memory provider. It works with Telegram, CLI, and any other OpenClaw channel. No extra configuration needed — one command and it's live.
Core search, rank fusion, deduplication, and entity tracking all run locally. No data leaves your machine. For enhanced fact extraction, add your own LLM key — supports OpenAI, Anthropic, Ollama, MLX, and any compatible endpoint. Auto-detected at startup. Without a key, core search still works fully offline. Update checks run in the background and fail silently.
Every token your AI reads costs money. Memory Stack cuts that three ways: (1) Tiered output — auto-recall uses ~100 tokens, on-demand search uses ~800 tokens, full text only loads when requested. Up to 90% fewer tokens per search. (2) Duplicate removal so you don't pay for the same information twice. (3) Compressed history — your agent drills down only when it needs detail.
When OpenClaw conversations get long, the system compresses old messages to fit the context window. Important decisions can get lost in this process. Memory Stack extracts key facts (decisions, deadlines, architecture choices) into a dedicated store before compression happens, and retrieves them instantly when relevant — zero wasted tokens re-explaining things.
No. Memory Stack is free and open source under the MIT license. You get every feature — search, code search, structured facts, cross-agent sharing, self-healing, everything. No fees, no data collection. Just files that live on your machine.
Yes. Memory Stack checks for new versions automatically when it starts up. New features and bug fixes ship via GitHub releases. Run one command when prompted and you're done — or watch the GitHub releases page directly.