Memory, explained

What is an AI memory layer?

Parsa BaratiJune 2, 20263 min read

ai memory
persistent memory
llm
memory layer

An AI memory layer is a system that stores, organizes, and recalls information across sessions so an AI remembers you over time. Large language models are stateless by default, so every conversation starts from zero. A memory layer sits between you and the model, writing down what matters and feeding the right pieces back when they're relevant. It's the difference between an assistant that re-introduces itself every morning and one that actually knows you.

Why do LLMs forget?

A language model doesn't have memory in the way you'd hope. Inside a single conversation it has a context window, a working buffer of the last N tokens. Close the chat, and that buffer is gone. The model didn't learn anything; it was never updated. It simply read your messages and predicted a reply.

This is the "amnesia problem," and it's structural, not a bug. The fix isn't a bigger context window (those fill up and degrade, a phenomenon people now call context rot). The fix is an external system that persists what's worth keeping and retrieves it on demand. That system is the memory layer.

What a memory layer actually does

A good memory layer does four things:

Capture: decide what's worth remembering from a conversation (a decision, a preference, a fact about you) and write it down.
Organize: store it in a way that can be searched later, usually as embeddings plus structured metadata.
Recall: when you ask something new, find the handful of past memories that are actually relevant and inject them into the model's context.
Maintain: update memories when facts change, forget what's stale, and resolve contradictions so the AI doesn't stay confidently wrong.

That last step is where most systems stop short. Storing facts is easy. Keeping a model of a person accurate over months is the hard part.

Memory layer vs. RAG vs. fine-tuning

These get conflated, so:

RAG (retrieval-augmented generation) pulls relevant documents into context at query time. A memory layer is a form of retrieval, but RAG usually retrieves from a static knowledge base, while a memory layer writes new memories from your conversations and recalls those. RAG answers "what does this document say?"; memory answers "what do I know about you?"
Fine-tuning bakes knowledge into the model's weights. It's expensive, slow to update, and wrong for personal memory: you can't fine-tune a model every time you mention a new preference. Memory is fast, editable, and per-user.

The short version: fine-tuning changes the model, RAG changes the prompt, and a memory layer changes what the AI knows about you specifically, and keeps changing it.

Not all memory is the same

The richest memory layers borrow from how human memory works, separating it into types:

Episodic: things that happened ("you shipped the latency fix on Tuesday").
Semantic: facts and concepts ("you prefer terse answers").
Procedural: how-tos and patterns you've established.
Foundational: the stable, identity-level stuff that rarely changes.

Treating these differently is what lets a memory layer recall the right kind of thing for the question, and lets it do things a flat fact-store can't, like reflect on episodes to form new semantic understanding.

Why it's the missing piece

You probably use several AI tools: one to think, one to code, one to draft. None of them know what you told the others. Each one is brilliant and amnesiac. A memory layer that lives outside any single model (for example, exposed over MCP so it plugs into Claude, Cursor, ChatGPT, and Codex alike) gives all of them one shared, portable memory of you.

That's the bet behind iCog: the intelligence you want isn't a smarter model. It's continuity. Not smart. Just knows you.

Want one memory across every AI tool you use? Try iCog →