iCog

COGNITIVEX · GUIDE

Agent memory architecture: a practical guide

Most agents forget everything the moment a turn ends. A real memory architecture is the difference between a chatbot that re-reads its context window and an agent that actually accumulates knowledge. Here is how the pieces fit (working, episodic, semantic, consolidation, retrieval) and how to build them without reinventing cognition.

WHAT "MEMORY ARCHITECTURE" ACTUALLY MEANS

An agent memory architecture is the set of stores, write rules, and retrieval paths that let an agent carry knowledge across turns, sessions, and tasks, instead of starting cold every time the context window resets. It is not a vector database. A vector database is one component. The architecture is the decision about what gets written, in what form, when it gets reorganized, and how it comes back at the right moment.

The reason this matters: an LLM does query → model → response → forget. The weights never change, and the only thing it "remembers" is whatever you paste back into the prompt. A memory architecture closes that loop: query → living memory → reasoning → learning → evolution, so the system gets better at your work the longer it runs. That loop is exactly what the Large Cognition Model (the LCM) is built around: the memory is the model.

WORKING · EPISODIC · SEMANTIC

Three kinds of memory, three different jobs

Borrowed from cognitive science for a good reason: these tiers have genuinely different lifetimes, write costs, and retrieval patterns. Collapsing them into one big embedding table is the most common mistake.

TierLifetimeHoldsWritten whenRead when
Working memoryOne turn / one taskLive tokens, scratchpad, the current planContinuously, in-contextEvery reasoning step
Episodic memoryIndefinite, time-stampedWhat happened: events, sessions, decisionsAfter a meaningful event"What did we do last week?"
Semantic memoryIndefinite, timelessFacts, preferences, how systems workWhen a durable fact is learned"How does X work?"
Procedural memoryIndefinite, reinforcedSkills, repeatable how-tos, patternsAfter a successful procedureWhen a similar task recurs

CognitiveX ships a fourth tier on top of these three: foundational memory for identity, values, and core beliefs. It matters because an agent that knows who it is for recalls differently than one that only knows facts. Four tiers total: semantic, episodic, procedural, foundational.

WHAT EARNS A WRITE

Not everything deserves to be remembered

The fastest way to ruin a memory system is to write everything. Log-everything stores fill with transcript noise, and retrieval drowns in near-duplicates. A good write path is opinionated about salience. It asks whether a span is worth carrying forward before it commits it.

  • Decisions and their reasons. "We chose SSE over WebSockets because Cloudflare proxies them" is worth more than the 40 messages that produced it.
  • Durable facts about the user or domain. Preferences, constraints, names, the shape of a system. These belong in semantic memory.
  • Events, anchored in time. "Shipped the billing MCP surface on the 14th" goes in episodic memory so chronological recall works.
  • Not ephemeral task steps, things already obvious from the code, or a verbatim paste of a long output. That is a changelog, not a memory.

A useful pattern: for any shipped feature or resolved bug, write two records. One short episodic anchor ("this happened, on this date, here is why") and one or more timeless semantic nuggets ("the markup ratio is X", "this is the footgun"). Write only semantic and your timelines go blank; write only episodic and future-you can't answer "how does this work." CognitiveX's pattern-detection and salience engines run on the write path so the system itself decides what is worth keeping. See how the LCM structures cognition.

THE PART EVERYONE SKIPS

Memory that reorganizes itself

Storage is the easy half. The half that separates a real architecture from a glorified log is consolidation: the offline process that compresses redundant memories, promotes repeated episodes into durable semantic facts, and links related records into a graph. In humans this happens during sleep. In CognitiveX it runs as overnight dream consolidation: the system replays the day's memories, merges duplicates, and synthesizes relationships you never explicitly stored.

Without consolidation, a memory store degrades over time: the same fact gets written fifteen slightly different ways, contradictions pile up, and retrieval returns a fog of near-matches. With it, the store gets denser and truer the longer it runs. This is also where higher-order cognition lives: pattern detection across episodes, reflection, and introspection about the agent's own state. These are algorithms with defined inputs and outputs; the language model only renders the result at the end.

GETTING THE RIGHT MEMORY BACK

Retrieval is a ranking problem, not a lookup

The last mile is pulling the right memories into context at the right moment, not the most, the right ones. Naive cosine similarity over one big table gets you 70% of the way and then fails in exactly the cases that matter: it surfaces an old, loosely-related memory because its recency and similarity scores happen to tie with what you actually need.

Better retrieval blends signals: semantic similarity, recency, salience weight, memory type, and an explicit notion of what the agent is currently doing. CognitiveX grounds recall in a declared current_task so past memories are treated as historical context rather than as candidates for "what we're talking about." Depth is a dial, not a constant: a shallow foundational lookup is cheap; a deep multi-hop recall across the relationship graph costs more and is worth it for hard questions. (That depth ladder is exactly what the recall-credit pricing reflects.)

Cross-agent recall matters here too. Because the memory lives in a model, not in one process, several agents can share the same living memory over MCP: one agent's writes become another agent's recall.

COMMON QUESTIONS

Frequently asked

Is a vector database enough for agent memory?
It is the retrieval substrate, not the architecture. You still need write rules (salience), tiering (episodic vs semantic), consolidation, and ranked retrieval on top. A vector DB with no consolidation degrades into a noisy log.

How is this different from a long context window?
A bigger context window lets you re-read more each turn, but it still forgets the moment the turn ends, and it pays full token cost every time. Memory is persistent, selective, and cheap to query. The two are complementary: memory decides what belongs in the window.

Do I have to build this myself?
No. The whole point of the LCM is that memory architecture is infrastructure you plug into: four-tier storage, salience-gated writes, overnight consolidation, and grounded retrieval, behind one SDK and MCP endpoint. You bring the application; the cognition layer comes built. See the comparison of memory layers and the Claude Code integration for concrete setups, and the LongMemEval benchmark for how we measure recall.

Build on a memory that learns

Skip the year of plumbing. Wire the cognitive layer into your agent and let it accumulate.

Start building →Try iCog →