Memory Optimization Guide

QMD tuning, thinking modes, LCM configuration, and practical memory stack optimization.

Last updated: March 21, 2026

OpenClaw Agent Optimization Guide

QMD Memory Backend, Thinking Mode, LCM, & Memory Search Tuning — Updated March 2026
Prepared by Mateo Vargas, BrightBlitz Marketing

Overview — What We Changed and Why
Change 1: Thinking Mode → HIGH
Change 2: QMD Memory Backend
Change 3: QMD Memory Search Tuning
Change 4: Session Transcript Indexing
Change 5: Memory Citations
Change 6: Extra Index Paths
Change 7: LCM v0.4.0 — Session Scoping
Change 8: QMD 2.0 Update
Full Config Block (Copy-Paste Ready)
Installation Steps
Gotchas & Lessons Learned
Verification Checklist

📋 Overview — What We Changed and Why

We made 6 changes to our OpenClaw configuration across two machines (Mac Mini + Mac Studio running 17 total agents). The goal: make agents reason better and remember better.

These are two separate problems that were both set to their lowest/default settings:

Reasoning (Thinking Mode) — was set to low by default. Agents were running on "power saving mode" for all complex analysis.
Memory (Search Quality) — was using basic OpenAI embeddings with no deduplication, no temporal awareness, and no keyword matching. Old notes outranked recent ones, and similar notes returned duplicate results.

Change	What	Impact	Risk
Thinking → HIGH	Enables extended reasoning for all agents + subagents	Critical	Low
QMD Backend	Local hybrid search replacing OpenAI embeddings	High	Low
MMR Re-ranking	Diversity in search results — eliminates duplicate snippets	High	Low
Temporal Decay	Recent notes rank higher than old ones	Medium	Low
Session Indexing	Past conversations become searchable via memory_search	High	Low
Citations	Memory search results include source file + line numbers	Medium	Low

🧠 Change 1: Thinking Mode → HIGH

The Problem: OpenClaw's thinking/reasoning mode defaults to low. This means every agent — including your main orchestrator making all the important decisions — was running with minimal internal reasoning. Same model, same data, but the agent wasn't actually thinking before responding.

What Thinking Mode Does

Anthropic's Claude models support "extended thinking" — an internal reasoning step before generating a response. At low, the model gives quick answers. At high, it reasons through the problem step by step before responding. The difference is dramatic for complex tasks.

Real-World Example

After switching to HIGH, our agent caught its own mistakes from 10 minutes earlier in the same conversation. Same model, same context — just actually reasoning now.

Available Levels

Level	Description	Use Case
`off`	No extended thinking	Simple acknowledgments
`minimal`	Brief reasoning	Quick Q&A
`low`	Light reasoning (DEFAULT)	Routine tasks
`medium`	Moderate reasoning	Multi-step analysis
`high`	Deep reasoning ✅ RECOMMENDED	Complex decisions, debugging, strategy
`xhigh`	Maximum reasoning	Research-grade analysis

Config

{
  "agents": {
    "defaults": {
      "thinkingDefault": "high",
      "subagents": {
        "thinking": "high"
      }
    }
  }
}

Expected Gain: Agents catch their own errors, produce more thorough analysis, and make better decisions on complex multi-step tasks. If you're on subscription plans (Max/Pro), you're already paying for this capacity — not using it is leaving performance on the table.

⚠️ Note: Higher thinking = slightly longer response times. The agent thinks longer before responding. This is a feature, not a bug. You can also override per-task with sessions_spawn(thinking="high") for dynamic control.

💾 Change 2: QMD Memory Backend

The Problem: OpenClaw's default memory search uses OpenAI embeddings — semantic (meaning-based) search only. This means if you search for a specific IP address, error message, or plugin version, it might not find it because those are exact terms, not concepts. Semantic search finds meaning; it misses keywords.

What QMD Is

QMD (Query Markup Documents) is a local hybrid search engine by Tobi Lütke (Shopify CEO). It replaces OpenClaw's default search with three layers:

BM25 Keyword Matching — finds exact terms (IP addresses, error codes, names, plugin versions)
Vector Semantic Search — finds related content even when different words are used
LLM Re-ranking — a small local model confirms which results are actually relevant

Everything runs locally. No API keys needed for the search itself. ~2GB of GGUF models auto-download on first use.

❌ Before (OpenAI Embeddings Only)

Search: "PHP 7.4 drillpointtopoint"
Result: ❌ Nothing found (semantic mismatch)

✅ After (QMD Hybrid)

Search: "PHP 7.4 drillpointtopoint"
Result: ✅ Found in memory/2026-02-03.md (BM25 keyword hit)

Installation

# Install QMD globally (npm or bun)
npm install -g @tobilu/qmd
# or: bun install -g @tobilu/qmd

# Verify
qmd --version  # Should show 2.0.1+

Config

{
  "memory": {
    "backend": "qmd"
  }
}

Expected Gain: Memory search finds both exact matches AND conceptually related content. Searching for "what happened with the PHP issue" finds notes that mention "drillpointtopoint.com PHP 7.4 EOL" even though the words don't overlap.

⚠️ Trade-off: Searches take 1-3 seconds instead of being instant. First search after install downloads ~2GB of GGUF models. Worth it for accuracy.

⚠️ Rollback: Remove "backend": "qmd" from config, restart. Falls back to SQLite automatically.

🔧 Change 3: QMD Memory Search Tuning

QMD works out of the box, but three settings that are OFF by default make a significant difference:

3A. MMR (Maximal Marginal Relevance) — Diversity Re-ranking

The Problem: Without MMR, if you have daily notes that mention the same topic across multiple days, search returns 5 near-identical snippets from 5 different days instead of 5 different pieces of information. You get quantity without diversity.

What MMR does: After finding the top results, it re-ranks them to maximize diversity. If result #2 is too similar to result #1, it gets demoted in favor of a result that adds new information.

The lambda parameter (0.0 to 1.0):

0.0 = maximum diversity (ignore relevance, just pick different results)
1.0 = maximum relevance (ignore diversity, return the best matches even if redundant)
0.7 = recommended balance — prioritize relevance but actively suppress duplicates

3B. Temporal Decay — Freshness Bias

The Problem: Without temporal decay, a perfectly worded note from 3 months ago outranks yesterday's update on the same topic. The agent retrieves stale information even when fresh context exists.

How it works: Applies an exponential decay to result scores based on age. With a 30-day half-life:

Note Age	Score Multiplier
Today	100%
30 days ago	50%
60 days ago	25%
90 days ago	12.5%

Old notes still appear if they're the only match — they just don't outrank fresh notes on the same topic.

3C. Embedding Cache — Cost Optimization

What it does: Saves computed embeddings so unchanged text isn't re-embedded on every reindex cycle. If you're using OpenAI for embeddings, this cuts unnecessary API calls. If you're using local models, it saves compute time.

Set maxEntries high enough to cover your total memory file content. 50,000 entries is generous for most setups.

Combined Config

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "cache": {
          "enabled": true,
          "maxEntries": 50000
        },
        "query": {
          "hybrid": {
            "enabled": true,
            "vectorWeight": 0.7,
            "textWeight": 0.3,
            "candidateMultiplier": 4,
            "mmr": {
              "enabled": true,
              "lambda": 0.7
            },
            "temporalDecay": {
              "enabled": true,
              "halfLifeDays": 30
            }
          }
        }
      }
    }
  }
}

Expected Gain: Memory search returns diverse, fresh, relevant results. No more 5 copies of the same note. No more 3-month-old context outranking yesterday's update. Combined with QMD's hybrid search, the agent remembers more and finds it faster.

📝 Change 4: Session Transcript Indexing

The Problem: By default, only files in the memory/ directory are searchable. If something was discussed in conversation but never written to a file, it's lost after session compaction. The agent can't recall past conversations.

What This Does

QMD indexes past session transcripts so memory_search can find things you discussed even if the agent didn't write them to a file. Set a retention window to control how far back it indexes.

Config

{
  "memory": {
    "qmd": {
      "sessions": {
        "enabled": true,
        "retentionDays": 30
      }
    }
  }
}

Expected Gain: "What did we discuss about X last week?" now returns actual results even if no one wrote it down. 30-day retention keeps the index lean.

📌 Change 5: Memory Citations

What it does: When memory_search returns results, each result includes Source: path#line so you (or the agent) can verify where the information came from. Good for audit trails and debugging memory quality.

Config

{
  "memory": {
    "citations": "auto"
  }
}

📂 Change 6: Extra Index Paths

What it does: By default, QMD only indexes memory/ and MEMORY.md. If you have reference docs, runbooks, or SOPs in other folders, QMD won't find them unless you add them.

Config

{
  "memory": {
    "qmd": {
      "includeDefaultMemory": true,
      "paths": [
        { "path": "reference/", "name": "reference" },
        { "path": "runbooks/", "name": "runbooks" },
        { "path": "sops/", "name": "sops" }
      ]
    }
  }
}

Add whatever folders contain persistent knowledge your agents should be able to search. Adjust the paths to match your workspace structure.

🔄 Change 7: LCM v0.4.0 — Session Scoping (March 2026)

The Problem: LCM (Lossless Context Management) was storing every session in its SQLite database, including cron jobs, subagent runs, and heartbeat checks. This bloated the database and wasted compaction cycles on throwaway sessions.

LCM v0.4.0 adds session scoping — you can tell it which sessions to ignore entirely and which should be read-only.

Key New Settings

Setting	What It Does	Recommended Value
`ignoreSessionPatterns`	Sessions matching these patterns are completely excluded from LCM — no storage, no compaction, no expansion	`["agent::cron:*"]`
`statelessSessionPatterns`	Matching sessions can read from LCM but never write. Subagents benefit from context without polluting the database	`["agent::subagent:*"]`
`skipStatelessSessions`	Enable stateless session behavior	`true`
`freshTailCount`	Number of recent messages protected from compaction	`32` (was 16)
`incrementalMaxDepth`	How deep the DAG cascades during compaction. -1 = unlimited	`-1` (was 3)

Config

{
  "plugins": {
    "entries": {
      "lossless-claw": {
        "enabled": true,
        "config": {
          "freshTailCount": 32,
          "incrementalMaxDepth": -1,
          "contextThreshold": 0.75,
          "ignoreSessionPatterns": [
            "agent:*:cron:**"
          ],
          "statelessSessionPatterns": [
            "agent:*:subagent:**"
          ],
          "skipStatelessSessions": true
        }
      }
    }
  }
}

⚠️ LCM v0.4.0 Config Gotcha: The plugin schema uses additionalProperties: false. Only these keys are allowed in the config: enabled, contextThreshold, incrementalMaxDepth, freshTailCount, leafMinFanout, condensedMinFanout, condensedMinFanoutHard, dbPath, ignoreSessionPatterns, statelessSessionPatterns, skipStatelessSessions, largeFileThresholdTokens, summaryModel, summaryProvider, expansionModel, expansionProvider. Any other key will crash the gateway on startup.

Session Persistence

Also recommended: increase OpenClaw's session idle timeout so long conversations don't reset:

{
  "session": {
    "reset": {
      "mode": "idle",
      "idleMinutes": 10080
    }
  }
}

10080 = 7 days. Prevents session resets during weekends or quiet periods.

Expected Gain: Cleaner LCM database (no cron noise), better context management in long sessions (32 fresh messages protected), and subagents that can see your context without adding to it.

Heartbeat Pruning (Optional)

If you run heartbeats, HEARTBEAT_OK responses accumulate in LCM. Set this environment variable (not plugin config) to clean them:

LCM_PRUNE_HEARTBEAT_OK=true

Add via systemd override or your process manager's environment config.

📦 Change 8: QMD 2.0 Update (March 2026)

What changed: QMD went from v1.0.7 to v2.0.1. Major version bump with a stable SDK API, but the CLI and OpenClaw integration work the same. Safe upgrade.

New Features Worth Knowing

Intent parameter — Pass --intent "context" to disambiguate searches. "performance" with intent "server speed" finds different results than "performance" with intent "employee review."
Collection ignore patterns — Exclude files from indexing: ignore: ["Sessions/**", "*.tmp"]
Query --explain — Debug tool showing retrieval scores, RRF contributions, and reranker scores per result
SDK / library mode — QMD can now be used as a Node.js library, not just CLI

Update Command

# npm:
npm install -g @tobilu/qmd@latest

# bun:
bun install -g @tobilu/qmd@latest

# Verify:
qmd --version  # Should show 2.0.1+

Collections Setup

QMD 2.0 still needs collections configured. If you haven't set them up, or after a fresh install:

# Index your memory files
qmd collection add ~/path-to/memory/ --name "daily-memory"

# Index deliverables, research, etc
qmd collection add ~/path-to/deliverables/ --name "deliverables"

# Generate embeddings (first time takes a few minutes on CPU)
qmd embed

Expected Gain: Same search quality with better internals. The intent parameter is useful for agent-driven searches where the system knows what domain you're asking about. No config changes needed — your existing QMD settings carry over.

📋 Full Config Block (Copy-Paste Ready)

This is the complete config patch (March 2026). Apply it via config.patch through your bot, or merge it into your openclaw.json manually. It triggers a gateway restart.

{
  "memory": {
    "backend": "qmd",
    "citations": "auto",
    "qmd": {
      "includeDefaultMemory": true,
      "paths": [
        { "path": "reference/", "name": "reference" },
        { "path": "runbooks/", "name": "runbooks" },
        { "path": "sops/", "name": "sops" }
      ],
      "sessions": {
        "enabled": true,
        "retentionDays": 30
      },
      "update": {
        "interval": "5m",
        "debounceMs": 15000
      },
      "limits": {
        "maxResults": 8,
        "timeoutMs": 6000
      }
    }
  },
  "session": {
    "reset": {
      "mode": "idle",
      "idleMinutes": 10080
    }
  },
  "plugins": {
    "entries": {
      "lossless-claw": {
        "enabled": true,
        "config": {
          "freshTailCount": 32,
          "incrementalMaxDepth": -1,
          "contextThreshold": 0.75,
          "ignoreSessionPatterns": ["agent:*:cron:**"],
          "statelessSessionPatterns": ["agent:*:subagent:**"],
          "skipStatelessSessions": true
        }
      }
    }
  },
  "agents": {
    "defaults": {
      "thinkingDefault": "high",
      "subagents": {
        "thinking": "high"
      },
      "memorySearch": {
        "cache": {
          "enabled": true,
          "maxEntries": 50000
        },
        "query": {
          "hybrid": {
            "enabled": true,
            "vectorWeight": 0.7,
            "textWeight": 0.3,
            "candidateMultiplier": 4,
            "mmr": {
              "enabled": true,
              "lambda": 0.7
            },
            "temporalDecay": {
              "enabled": true,
              "halfLifeDays": 30
            }
          }
        }
      }
    }
  }
}

🛠️ Installation Steps

1 Install QMD

bun install -g https://github.com/tobi/qmd

If bun isn't installed: curl -fsSL https://bun.sh/install | bash

2 Apply the config patch

Either send the full config block above to your bot as a config.patch, or merge it into ~/.openclaw/openclaw.json manually.

# Via bot:
# Paste the config block and ask your agent to apply it via config.patch

# Or manually:
nano ~/.openclaw/openclaw.json
# Merge the config, save, then:
openclaw gateway restart

3 Wait for first indexing

QMD downloads ~2GB of GGUF models on first search. Let it complete. Subsequent searches are local and fast.

4 Test it

Send your agent a message that triggers memory_search. Ask something like "What did we discuss about [topic]?" and check that:

Results include Source: path#line (citations working)
Results are diverse (MMR working)
Recent notes rank higher (temporal decay working)
Provider shows "provider": "qmd" in tool response

⚠️ Gotchas & Lessons Learned

🚨 Do NOT use small local models (e.g., llama3.2:3b) as fallbacks for real agent work

We added ollama/llama3.2:3b as a last-resort fallback for when Anthropic rate limits hit. This was a mistake. The 3B model can't use tools properly — it hallucinated fake curl commands to non-existent URLs and got stuck in a 5-minute loop spamming garbage. Two agents had their sessions corrupted.

Rule: Small local models are fine for heartbeats (just say "HEARTBEAT_OK") but never for real agent work that requires tool use. If your primary model is rate-limited and you have no capable fallback, let it fail cleanly with an error rather than falling back to a model that will do the wrong thing confidently.

⚠️ Session model overrides persist after fallback

When an agent falls back to a smaller model, the session can get a model override that persists even after the rate limit clears. The session keeps using the bad model. Fix: reset the model field in the session JSON or kill the session to force a fresh start.

⚠️ Gateway restart ≠ session reset

Restarting the gateway does NOT clear existing sessions. If a session is corrupted (stuck on wrong model, looping), you need to explicitly kill that session or reset its model override in sessions.json.

⚠️ QMD needs bun installed

QMD is installed via bun, not npm. Make sure bun is in your PATH for the gateway service. If the gateway can't find qmd, memory search will silently fall back to the old provider.

✅ Verification Checklist

Check	How to Verify	Expected Result
QMD installed	`qmd --help`	Shows QMD help text
QMD backend active	Run `memory_search`, check response	`"provider": "qmd"` in result
Thinking mode HIGH	Send `/status` to your bot	Shows `Reasoning: high` or similar
MMR working	Search a topic mentioned in multiple daily logs	Results from different dates, not 5 copies of same note
Temporal decay working	Search a topic with old + recent notes	Recent note ranks higher
Citations working	Any `memory_search` call	Results include `Source: path#line`
Session indexing working	Ask about something discussed (not written to file)	Agent finds it via session transcript search
Extra paths indexed	Search for content in `reference/` or `sops/`	Results found from those directories

Summary

Two levers that were just sitting there:

Thinking mode HIGH = agent actually reasons before responding
QMD + tuning = agent actually remembers and finds things accurately

Combined: the agent reasons better AND remembers better. Same model, same cost (on subscription plans), dramatically better output.

Prepared by Mateo Vargas — BrightBlitz Marketing
Originally February 27, 2026 — Updated March 21, 2026

This content is protected

Memory Optimization Guide

OpenClaw Agent Optimization Guide

Table of Contents

📋 Overview — What We Changed and Why

🧠 Change 1: Thinking Mode → HIGH

What Thinking Mode Does

Real-World Example

Available Levels

Config

💾 Change 2: QMD Memory Backend

What QMD Is

Installation

Config

🔧 Change 3: QMD Memory Search Tuning

3A. MMR (Maximal Marginal Relevance) — Diversity Re-ranking

3B. Temporal Decay — Freshness Bias

3C. Embedding Cache — Cost Optimization

Combined Config

📝 Change 4: Session Transcript Indexing

What This Does

Config

📌 Change 5: Memory Citations

Config

📂 Change 6: Extra Index Paths

Config

🔄 Change 7: LCM v0.4.0 — Session Scoping (March 2026)

Key New Settings

Config

Session Persistence

Heartbeat Pruning (Optional)

📦 Change 8: QMD 2.0 Update (March 2026)

New Features Worth Knowing

Update Command

Collections Setup

📋 Full Config Block (Copy-Paste Ready)

🛠️ Installation Steps

⚠️ Gotchas & Lessons Learned

🚨 Do NOT use small local models (e.g., llama3.2:3b) as fallbacks for real agent work

⚠️ Session model overrides persist after fallback

⚠️ Gateway restart ≠ session reset

⚠️ QMD needs bun installed

✅ Verification Checklist

Summary