Memory Management in claude

Published on 2025-12-12

Last updated on 2025-12-12

Technical

Introduction

In the last couple of weeks, I’ve been using AI for coding intensively, especially with Claude and Antigravity. The biggest challenge is memory management. Each LLM model has a context window; when it reaches that limit, performance decreases and hallucinations increase. Claude has a mechanism to auto-compact the session; this makes the session clearer, but we may lose some context. You could introduce a section titled “Context Windows and Auto-Compaction” to frame these ideas and explain trade-offs.

When, Where, and What to Save

In the past week, I’ve been thinking about how to save this memory. Initially, I didn’t know when to save; the second question was where to save; and what to save is also critical. A dedicated section like “When, Where, and What to Save” could help organize these decision points.

when

I have encountered many times that the context window reaches the threshold, so determining when to save is intuitive, but how to call some script or app to save is the challenge. Ironically, when I went through the Claude website, I found that Claude has many events and hooks that can be used. I think this is a good point. When Claude starts to compact, it emits a preCompact event, and Claude also supports defining hooks when an event is captured, so the first piece is ready. Consider adding a section called “Event Hooks (e.g., preCompact) and Automation” to detail triggers and implementation steps.

what

As a practical matter, I want to minimize loss. That principle guided my memory management design.

All information saved to local or remote storage is summarized by an LLM via the Claude Python client. Initially, I wrote a prompt to capture everything, but after reading context/prompt engineering material from Google and others, I realized much of the data was noise. I refined the prompt to extract only the essentials:

Key architectures
Important code snippets or patterns
Specifications
Requirement changes
Conversations that led to empirical or final decisions

I explicitly exclude logs and the LLM’s intermediate thinking process. This keeps each memory compact and useful for later reference.

where

The next thing is where to save. Initially, we can save to the project directory under .claude. At first I saved as .claude/summaries, since I assumed that it was a summary of the session, but later I realized that it should be memory, so I renamed it to .claude/memories. But I still want to save to a centrally managed place where I can derive patterns for later development. I purchased Nowledge last month but didn’t have a chance to test it. Now it is time to use it. This part could become a section titled “Local vs. Centralized Storage (Nowledge)” to compare options and rationale.

How

MCP Integration, Metadata Limitations, and API Fallback

The next thing is how. Saving to local storage is straightforward, but for Nowledge, it provides an MCP server and skill/commands integration into Claude. My first thought was to use its MCP tools, and the next question arises: how to invoke its MCP tools? I found a project called mcp-use that provides a client library to interact with various MCP tools; it is perfect for such a scenario, so I decided to use this. After several days of work and debugging with Claude, I finally saved memory to both local storage and to Nowledge, but when I wanted to add metadata to the memory, it always failed. I checked Nowledge; I found it supports metadata, but why wasn’t it working? I asked the developer on X; he replied that the MCP tool implementation doesn’t accept metadata. So I switched to the pure Nowledge API implementation; luckily it is not complicated. I also implemented saving the thread when the session ends. You might add a section titled “MCP Integration, Metadata Limitations, and API Fallback” to document the approach, issues, and solutions.

second remote storage

currently nowledge accept memory only 1792 characters, which may not enough to storage whole info between 2 compaction. so i will consider to other options, there is another project called claude-mem which has the similar functionality, will try later.

plugin

I created the plugin context-keeper, which consolidates multiple slash commands and hooks into a single package. This makes it easier to distribute and use across environments.

The plugin adds a hook that, on session start or after an auto-compaction and resume, automatically loads the latest memory to minimize interruption.

It provides three slash commands:

context-keeper:list-memories — lists all memories saved locally
context-keeper:list-sessions — lists all sessions that have saved memories
context-keeper:load-memory — manually loads the latest memory

what next

continue to refine the prompt, save all memory in json format, which is more structure, but not sure if it is the best options

Back to Blog