Memory Optimization Guide
QMD tuning, thinking modes, LCM configuration, and practical memory stack optimization.
Last updated: March 21, 2026
OpenClaw Agent Optimization Guide
QMD Memory Backend, Thinking Mode, LCM, & Memory Search Tuning — Updated March 2026
Prepared by Mateo Vargas, BrightBlitz Marketing
Table of Contents
- Overview — What We Changed and Why
- Change 1: Thinking Mode → HIGH
- Change 2: QMD Memory Backend
- Change 3: QMD Memory Search Tuning
- Change 4: Session Transcript Indexing
- Change 5: Memory Citations
- Change 6: Extra Index Paths
- Change 7: LCM v0.4.0 — Session Scoping
- Change 8: QMD 2.0 Update
- Full Config Block (Copy-Paste Ready)
- Installation Steps
- Gotchas & Lessons Learned
- Verification Checklist
📋 Overview — What We Changed and Why
We made 6 changes to our OpenClaw configuration across two machines (Mac Mini + Mac Studio running 17 total agents). The goal: make agents reason better and remember better.
These are two separate problems that were both set to their lowest/default settings:
- Reasoning (Thinking Mode) — was set to
lowby default. Agents were running on "power saving mode" for all complex analysis. - Memory (Search Quality) — was using basic OpenAI embeddings with no deduplication, no temporal awareness, and no keyword matching. Old notes outranked recent ones, and similar notes returned duplicate results.
| Change | What | Impact | Risk |
|---|---|---|---|
| Thinking → HIGH | Enables extended reasoning for all agents + subagents | Critical | Low |
| QMD Backend | Local hybrid search replacing OpenAI embeddings | High | Low |
| MMR Re-ranking | Diversity in search results — eliminates duplicate snippets | High | Low |
| Temporal Decay | Recent notes rank higher than old ones | Medium | Low |
| Session Indexing | Past conversations become searchable via memory_search | High | Low |
| Citations | Memory search results include source file + line numbers | Medium | Low |
🧠 Change 1: Thinking Mode → HIGH
The Problem: OpenClaw's thinking/reasoning mode defaults to low. This means every agent — including your main orchestrator making all the important decisions — was running with minimal internal reasoning. Same model, same data, but the agent wasn't actually thinking before responding.
What Thinking Mode Does
Anthropic's Claude models support "extended thinking" — an internal reasoning step before generating a response. At low, the model gives quick answers. At high, it reasons through the problem step by step before responding. The difference is dramatic for complex tasks.
Real-World Example
After switching to HIGH, our agent caught its own mistakes from 10 minutes earlier in the same conversation. Same model, same context — just actually reasoning now.
Available Levels
| Level | Description | Use Case |
|---|---|---|
off | No extended thinking | Simple acknowledgments |
minimal | Brief reasoning | Quick Q&A |
low | Light reasoning (DEFAULT) | Routine tasks |
medium | Moderate reasoning | Multi-step analysis |
high | Deep reasoning ✅ RECOMMENDED | Complex decisions, debugging, strategy |
xhigh | Maximum reasoning | Research-grade analysis |
Config
{
"agents": {
"defaults": {
"thinkingDefault": "high",
"subagents": {
"thinking": "high"
}
}
}
}
Expected Gain: Agents catch their own errors, produce more thorough analysis, and make better decisions on complex multi-step tasks. If you're on subscription plans (Max/Pro), you're already paying for this capacity — not using it is leaving performance on the table.
⚠️ Note: Higher thinking = slightly longer response times. The agent thinks longer before responding. This is a feature, not a bug. You can also override per-task with sessions_spawn(thinking="high") for dynamic control.
💾 Change 2: QMD Memory Backend
The Problem: OpenClaw's default memory search uses OpenAI embeddings — semantic (meaning-based) search only. This means if you search for a specific IP address, error message, or plugin version, it might not find it because those are exact terms, not concepts. Semantic search finds meaning; it misses keywords.
What QMD Is
QMD (Query Markup Documents) is a local hybrid search engine by Tobi Lütke (Shopify CEO). It replaces OpenClaw's default search with three layers:
- BM25 Keyword Matching — finds exact terms (IP addresses, error codes, names, plugin versions)
- Vector Semantic Search — finds related content even when different words are used
- LLM Re-ranking — a small local model confirms which results are actually relevant
Everything runs locally. No API keys needed for the search itself. ~2GB of GGUF models auto-download on first use.
Search: "PHP 7.4 drillpointtopoint"
Result: ❌ Nothing found (semantic mismatch)
Search: "PHP 7.4 drillpointtopoint"
Result: ✅ Found in memory/2026-02-03.md (BM25 keyword hit)
Installation
# Install QMD globally (npm or bun) npm install -g @tobilu/qmd # or: bun install -g @tobilu/qmd # Verify qmd --version # Should show 2.0.1+
Config
{
"memory": {
"backend": "qmd"
}
}
Expected Gain: Memory search finds both exact matches AND conceptually related content. Searching for "what happened with the PHP issue" finds notes that mention "drillpointtopoint.com PHP 7.4 EOL" even though the words don't overlap.
⚠️ Trade-off: Searches take 1-3 seconds instead of being instant. First search after install downloads ~2GB of GGUF models. Worth it for accuracy.
⚠️ Rollback: Remove "backend": "qmd" from config, restart. Falls back to SQLite automatically.
🔧 Change 3: QMD Memory Search Tuning
QMD works out of the box, but three settings that are OFF by default make a significant difference:
3A. MMR (Maximal Marginal Relevance) — Diversity Re-ranking
The Problem: Without MMR, if you have daily notes that mention the same topic across multiple days, search returns 5 near-identical snippets from 5 different days instead of 5 different pieces of information. You get quantity without diversity.
What MMR does: After finding the top results, it re-ranks them to maximize diversity. If result #2 is too similar to result #1, it gets demoted in favor of a result that adds new information.
The lambda parameter (0.0 to 1.0):
0.0= maximum diversity (ignore relevance, just pick different results)1.0= maximum relevance (ignore diversity, return the best matches even if redundant)0.7= recommended balance — prioritize relevance but actively suppress duplicates
3B. Temporal Decay — Freshness Bias
The Problem: Without temporal decay, a perfectly worded note from 3 months ago outranks yesterday's update on the same topic. The agent retrieves stale information even when fresh context exists.
How it works: Applies an exponential decay to result scores based on age. With a 30-day half-life:
| Note Age | Score Multiplier |
|---|---|
| Today | 100% |
| 30 days ago | 50% |
| 60 days ago | 25% |
| 90 days ago | 12.5% |
Old notes still appear if they're the only match — they just don't outrank fresh notes on the same topic.
3C. Embedding Cache — Cost Optimization
What it does: Saves computed embeddings so unchanged text isn't re-embedded on every reindex cycle. If you're using OpenAI for embeddings, this cuts unnecessary API calls. If you're using local models, it saves compute time.
Set maxEntries high enough to cover your total memory file content. 50,000 entries is generous for most setups.
Combined Config
{
"agents": {
"defaults": {
"memorySearch": {
"cache": {
"enabled": true,
"maxEntries": 50000
},
"query": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3,
"candidateMultiplier": 4,
"mmr": {
"enabled": true,
"lambda": 0.7
},
"temporalDecay": {
"enabled": true,
"halfLifeDays": 30
}
}
}
}
}
}
}
Expected Gain: Memory search returns diverse, fresh, relevant results. No more 5 copies of the same note. No more 3-month-old context outranking yesterday's update. Combined with QMD's hybrid search, the agent remembers more and finds it faster.
📝 Change 4: Session Transcript Indexing
The Problem: By default, only files in the memory/ directory are searchable. If something was discussed in conversation but never written to a file, it's lost after session compaction. The agent can't recall past conversations.
What This Does
QMD indexes past session transcripts so memory_search can find things you discussed even if the agent didn't write them to a file. Set a retention window to control how far back it indexes.
Config
{
"memory": {
"qmd": {
"sessions": {
"enabled": true,
"retentionDays": 30
}
}
}
}
Expected Gain: "What did we discuss about X last week?" now returns actual results even if no one wrote it down. 30-day retention keeps the index lean.
📌 Change 5: Memory Citations
What it does: When memory_search returns results, each result includes Source: path#line so you (or the agent) can verify where the information came from. Good for audit trails and debugging memory quality.
Config
{
"memory": {
"citations": "auto"
}
}
📂 Change 6: Extra Index Paths
What it does: By default, QMD only indexes memory/ and MEMORY.md. If you have reference docs, runbooks, or SOPs in other folders, QMD won't find them unless you add them.
Config
{
"memory": {
"qmd": {
"includeDefaultMemory": true,
"paths": [
{ "path": "reference/", "name": "reference" },
{ "path": "runbooks/", "name": "runbooks" },
{ "path": "sops/", "name": "sops" }
]
}
}
}
Add whatever folders contain persistent knowledge your agents should be able to search. Adjust the paths to match your workspace structure.
🔄 Change 7: LCM v0.4.0 — Session Scoping (March 2026)
The Problem: LCM (Lossless Context Management) was storing every session in its SQLite database, including cron jobs, subagent runs, and heartbeat checks. This bloated the database and wasted compaction cycles on throwaway sessions.
LCM v0.4.0 adds session scoping — you can tell it which sessions to ignore entirely and which should be read-only.
Key New Settings
| Setting | What It Does | Recommended Value |
|---|---|---|
ignoreSessionPatterns |
Sessions matching these patterns are completely excluded from LCM — no storage, no compaction, no expansion | ["agent:*:cron:**"] |
statelessSessionPatterns |
Matching sessions can read from LCM but never write. Subagents benefit from context without polluting the database | ["agent:*:subagent:**"] |
skipStatelessSessions |
Enable stateless session behavior | true |
freshTailCount |
Number of recent messages protected from compaction | 32 (was 16) |
incrementalMaxDepth |
How deep the DAG cascades during compaction. -1 = unlimited | -1 (was 3) |
Config
{
"plugins": {
"entries": {
"lossless-claw": {
"enabled": true,
"config": {
"freshTailCount": 32,
"incrementalMaxDepth": -1,
"contextThreshold": 0.75,
"ignoreSessionPatterns": [
"agent:*:cron:**"
],
"statelessSessionPatterns": [
"agent:*:subagent:**"
],
"skipStatelessSessions": true
}
}
}
}
}
⚠️ LCM v0.4.0 Config Gotcha: The plugin schema uses additionalProperties: false. Only these keys are allowed in the config: enabled, contextThreshold, incrementalMaxDepth, freshTailCount, leafMinFanout, condensedMinFanout, condensedMinFanoutHard, dbPath, ignoreSessionPatterns, statelessSessionPatterns, skipStatelessSessions, largeFileThresholdTokens, summaryModel, summaryProvider, expansionModel, expansionProvider. Any other key will crash the gateway on startup.
Session Persistence
Also recommended: increase OpenClaw's session idle timeout so long conversations don't reset:
{
"session": {
"reset": {
"mode": "idle",
"idleMinutes": 10080
}
}
}
10080 = 7 days. Prevents session resets during weekends or quiet periods.
Expected Gain: Cleaner LCM database (no cron noise), better context management in long sessions (32 fresh messages protected), and subagents that can see your context without adding to it.
Heartbeat Pruning (Optional)
If you run heartbeats, HEARTBEAT_OK responses accumulate in LCM. Set this environment variable (not plugin config) to clean them:
LCM_PRUNE_HEARTBEAT_OK=true
Add via systemd override or your process manager's environment config.
📦 Change 8: QMD 2.0 Update (March 2026)
What changed: QMD went from v1.0.7 to v2.0.1. Major version bump with a stable SDK API, but the CLI and OpenClaw integration work the same. Safe upgrade.
New Features Worth Knowing
- Intent parameter — Pass
--intent "context"to disambiguate searches. "performance" with intent "server speed" finds different results than "performance" with intent "employee review." - Collection ignore patterns — Exclude files from indexing:
ignore: ["Sessions/**", "*.tmp"] - Query --explain — Debug tool showing retrieval scores, RRF contributions, and reranker scores per result
- SDK / library mode — QMD can now be used as a Node.js library, not just CLI
Update Command
# npm: npm install -g @tobilu/qmd@latest # bun: bun install -g @tobilu/qmd@latest # Verify: qmd --version # Should show 2.0.1+
Collections Setup
QMD 2.0 still needs collections configured. If you haven't set them up, or after a fresh install:
# Index your memory files qmd collection add ~/path-to/memory/ --name "daily-memory" # Index deliverables, research, etc qmd collection add ~/path-to/deliverables/ --name "deliverables" # Generate embeddings (first time takes a few minutes on CPU) qmd embed
Expected Gain: Same search quality with better internals. The intent parameter is useful for agent-driven searches where the system knows what domain you're asking about. No config changes needed — your existing QMD settings carry over.
📋 Full Config Block (Copy-Paste Ready)
This is the complete config patch (March 2026). Apply it via config.patch through your bot, or merge it into your openclaw.json manually. It triggers a gateway restart.
{
"memory": {
"backend": "qmd",
"citations": "auto",
"qmd": {
"includeDefaultMemory": true,
"paths": [
{ "path": "reference/", "name": "reference" },
{ "path": "runbooks/", "name": "runbooks" },
{ "path": "sops/", "name": "sops" }
],
"sessions": {
"enabled": true,
"retentionDays": 30
},
"update": {
"interval": "5m",
"debounceMs": 15000
},
"limits": {
"maxResults": 8,
"timeoutMs": 6000
}
}
},
"session": {
"reset": {
"mode": "idle",
"idleMinutes": 10080
}
},
"plugins": {
"entries": {
"lossless-claw": {
"enabled": true,
"config": {
"freshTailCount": 32,
"incrementalMaxDepth": -1,
"contextThreshold": 0.75,
"ignoreSessionPatterns": ["agent:*:cron:**"],
"statelessSessionPatterns": ["agent:*:subagent:**"],
"skipStatelessSessions": true
}
}
}
},
"agents": {
"defaults": {
"thinkingDefault": "high",
"subagents": {
"thinking": "high"
},
"memorySearch": {
"cache": {
"enabled": true,
"maxEntries": 50000
},
"query": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3,
"candidateMultiplier": 4,
"mmr": {
"enabled": true,
"lambda": 0.7
},
"temporalDecay": {
"enabled": true,
"halfLifeDays": 30
}
}
}
}
}
}
}
🛠️ Installation Steps
1 Install QMD
bun install -g https://github.com/tobi/qmd
If bun isn't installed: curl -fsSL https://bun.sh/install | bash
2 Apply the config patch
Either send the full config block above to your bot as a config.patch, or merge it into ~/.openclaw/openclaw.json manually.
# Via bot: # Paste the config block and ask your agent to apply it via config.patch # Or manually: nano ~/.openclaw/openclaw.json # Merge the config, save, then: openclaw gateway restart
3 Wait for first indexing
QMD downloads ~2GB of GGUF models on first search. Let it complete. Subsequent searches are local and fast.
4 Test it
Send your agent a message that triggers memory_search. Ask something like "What did we discuss about [topic]?" and check that:
- Results include
Source: path#line(citations working) - Results are diverse (MMR working)
- Recent notes rank higher (temporal decay working)
- Provider shows
"provider": "qmd"in tool response
⚠️ Gotchas & Lessons Learned
🚨 Do NOT use small local models (e.g., llama3.2:3b) as fallbacks for real agent work
We added ollama/llama3.2:3b as a last-resort fallback for when Anthropic rate limits hit. This was a mistake. The 3B model can't use tools properly — it hallucinated fake curl commands to non-existent URLs and got stuck in a 5-minute loop spamming garbage. Two agents had their sessions corrupted.
Rule: Small local models are fine for heartbeats (just say "HEARTBEAT_OK") but never for real agent work that requires tool use. If your primary model is rate-limited and you have no capable fallback, let it fail cleanly with an error rather than falling back to a model that will do the wrong thing confidently.
⚠️ Session model overrides persist after fallback
When an agent falls back to a smaller model, the session can get a model override that persists even after the rate limit clears. The session keeps using the bad model. Fix: reset the model field in the session JSON or kill the session to force a fresh start.
⚠️ Gateway restart ≠ session reset
Restarting the gateway does NOT clear existing sessions. If a session is corrupted (stuck on wrong model, looping), you need to explicitly kill that session or reset its model override in sessions.json.
⚠️ QMD needs bun installed
QMD is installed via bun, not npm. Make sure bun is in your PATH for the gateway service. If the gateway can't find qmd, memory search will silently fall back to the old provider.
✅ Verification Checklist
| Check | How to Verify | Expected Result |
|---|---|---|
| QMD installed | qmd --help |
Shows QMD help text |
| QMD backend active | Run memory_search, check response |
"provider": "qmd" in result |
| Thinking mode HIGH | Send /status to your bot |
Shows Reasoning: high or similar |
| MMR working | Search a topic mentioned in multiple daily logs | Results from different dates, not 5 copies of same note |
| Temporal decay working | Search a topic with old + recent notes | Recent note ranks higher |
| Citations working | Any memory_search call |
Results include Source: path#line |
| Session indexing working | Ask about something discussed (not written to file) | Agent finds it via session transcript search |
| Extra paths indexed | Search for content in reference/ or sops/ |
Results found from those directories |
Summary
Two levers that were just sitting there:
- Thinking mode HIGH = agent actually reasons before responding
- QMD + tuning = agent actually remembers and finds things accurately
Combined: the agent reasons better AND remembers better. Same model, same cost (on subscription plans), dramatically better output.
Prepared by Mateo Vargas — BrightBlitz Marketing
Originally February 27, 2026 — Updated March 21, 2026