Integrations

MCP Memory Server: Hosted vs Local

Parsa BaratiJune 14, 20266 min read

mcp memory server
model context protocol
knowledge graph memory
cross-client memory

An MCP memory server gives an AI client persistent memory over the Model Context Protocol. The official @modelcontextprotocol/server-memory runs locally as a JSONL knowledge graph, it stores exactly what you write. Hosted servers like CognitiveX add consolidation, salience-weighted decay, and one shared memory across every MCP client. The split is local-file-store versus hosted memory system.

What is an MCP memory server?

The Model Context Protocol is an open standard for connecting AI clients, Claude Desktop, Cursor, your own apps, to external tools and data. An MCP memory server is one of those connections: a server that exposes tools for writing and reading memory, so the client can remember things across sessions instead of starting from zero every chat.

There are two ways to ship one. You can run it locally, where memory lives in a file on your machine. Or you can run it hosted, where memory lives on a backend you connect to. That choice decides almost everything else: whether your memory follows you across machines, whether it learns over time, and how much it can hold.

The official server: a local JSONL knowledge graph

Anthropic ships a reference implementation, @modelcontextprotocol/server-memory, often called the Knowledge Graph Memory Server. It's a clean, correct example, and worth understanding exactly because it sets the baseline.

It stores everything in a single flat file, memory.jsonl (path configurable via MEMORY_FILE_PATH). The data model has three primitives:

Entities, nodes with a name and a type.
Relations, directed, active-voice edges between entities.
Observations, atomic, one-fact-per-line strings attached to an entity.

It exposes nine tools, and every one of them is a create, read, or delete operation: create_entities, create_relations, add_observations, delete_entities, delete_observations, delete_relations, read_graph, search_nodes, and open_nodes. Search is keyword/substring matching over node names and observations.

That's the whole system, and it's intentional. Anthropic classifies it as a reference server, a starting point you extend, not a production memory product. It's widely used precisely because it's a clean primitive.

What the local server doesn't do (by design)

Because it's a reference CRUD store, the official server has no:

Consolidation, it never promotes repeated episodic events into durable semantic facts.
Decay or salience weighting, every observation is weighted equally, forever.
Summarization or forgetting, noise stays in the file alongside what matters.
Learned relevance, search is substring matching, not learned ranking.

None of that is a bug. It stores exactly what you tell it, verbatim, and returns it on search. But over a long relationship the JSONL grows unbounded, and the agent re-reads stale, equally-weighted facts. It's a notebook, not a memory.

It's also local and single-machine. The graph is a file on one computer. Claude Desktop on your laptop and Cursor on another machine do not share it unless you manually copy the file around. There's no hosted backend, so it isn't cross-client out of the box.

Hosted, consolidating, cross-client: the CognitiveX approach

CognitiveX ships as an MCP server too, same protocol, same clients, but it's a different class of system. The honest one-line contrast: the official server gives Claude a notebook; CognitiveX gives it a memory that learns.

Three architectural differences:

Store vs. consolidate. The official server persists exactly what's written, weighted equally, forever. CognitiveX runs a cognition engine plus dream consolidation: episodic events get promoted into semantic facts, patterns and skills are extracted, and salience-weighted decay lets noise fade so what surfaces is what matters. (That's the mechanism, see memory consolidation for AI agents for how it works.)
Local file vs. hosted + cross-client. The official server is a JSONL file on one machine. CognitiveX is hosted, so the same memory is available across every MCP client, Claude Desktop, Cursor, your own apps, without copying files or self-hosting a container stack.
Stated lineage, not a benchmark. CognitiveX's consolidation and salience design draws on ACT-R and Generative-Agents memory architecture. That's an engineering pedigree, not a performance claim.

The honesty hook underneath all of it: consolidation and salience are deterministic algorithms. The LLM only renders language at the very end. That's a real difference from "stuff text in a file and hope the model re-reads it."

Comparison: local reference vs. hosted memory system

Capability	`@modelcontextprotocol/server-memory`	CognitiveX (hosted MCP)
Storage	Local JSONL file (`memory.jsonl`)	Hosted backend
Classification	Anthropic reference server	Hosted memory system
Cross-client	Manual (copy the file)	Yes, same memory everywhere
Tools	9, all create/read/delete	Recall, remember, consolidate, more
Search	Keyword/substring	Salience-weighted recall
Consolidation	None	Episodic → semantic promotion
Decay / forgetting	None (grows unbounded)	Salience-weighted decay
Memory types	Entities, relations, observations	Episodic, semantic, procedural, foundational

Where this fits the broader market

The category is moving toward hosted memory. mem0 archived its standalone mem0-mcp repo on March 24, 2026 and redirected users to its hosted MCP server. Its local-first option, OpenMemory MCP, is cross-client, but it requires you to self-host a Docker stack (Postgres + Qdrant) to get there. The pattern is clear: cross-client, consolidating memory wants a backend, and the question is just whether you run it or someone hosts it for you.

Memory quality is sometimes measured on benchmarks like LoCoMo (1,540 questions over multi-session dialogues). Those scores are vendor-reported today, the benchmark itself has known annotation issues, and the official server-memory doesn't publish one at all, so we won't claim a number CognitiveX hasn't earned. CognitiveX's edge here is architectural: it's a learning system, not a storing one, and the mechanism is the argument.

Finding it: the MCP directory angle

MCP servers get discovered through directories. PulseMCP, Glama, and mcp.so all index thousands of servers and accept submissions, and the official local memory server already ranks high in those listings as the reference choice.

That's exactly where a hosted, consolidating memory belongs too, listed right next to the local primitive, so users browsing for "memory" find the learning alternative in the same place they'd otherwise only find the file-based one. Whether you're wiring memory into Claude Code or Cursor, the connection is the same MCP handshake; the difference is what's on the other end.

FAQ

What is an MCP memory server? A server that connects to AI clients over the Model Context Protocol and exposes tools for storing and recalling memory, so the AI remembers across sessions instead of starting fresh each chat.

Does the official MCP memory server store data locally or in the cloud? Locally. @modelcontextprotocol/server-memory keeps everything in a single JSONL file (memory.jsonl) on one machine. There's no hosted backend.

Can I share memory across Claude Desktop, Cursor, and other MCP clients? Not with the local reference server unless you manually share the file. A hosted server like CognitiveX gives every MCP client the same memory automatically.

Does @modelcontextprotocol/server-memory forget or summarize old memories? No. It's a CRUD knowledge graph, it stores exactly what you write, weighted equally, with no consolidation, decay, or summarization. Removing old data is a manual delete.

Is the official MCP memory server good for production long-term use? It's an excellent reference implementation and a fine primitive to build on. But it's intentionally a store, not a memory system, long-term, relationship-grade use wants consolidation it doesn't attempt.

What are the alternatives to the official MCP memory server? mem0 (hosted MCP), OpenMemory (self-hosted Docker stack, cross-client), and CognitiveX (hosted, consolidating, cross-client). See our AI memory layer comparison for the full landscape.

The bottom line

The local reference server is the right tool when you want a simple, single-machine knowledge graph you fully control. A hosted MCP memory server is the right tool when you want one memory that follows you across every client and actually learns what matters, promoting episodes to facts and letting noise fade instead of accumulating forever.

CognitiveX ships as an MCP server you can point Claude, Cursor, and your own apps at, and the memory on the other end consolidates instead of just storing. Try CognitiveX →