Memory, explained

Do LLMs Have Memory?

Parsa BaratiJune 14, 20265 min read

do llms have memory
llm memory
stateless llm
context window

No, large language models are stateless. The model has no memory of you, your past conversations, or even your previous message unless that text is re-supplied in the current request. A trained LLM is a fixed set of weights; inference reads those weights forward and updates nothing. When ChatGPT or Claude seems to "remember" you, that isn't the model, it's an application layer around the model re-injecting saved facts into each new prompt.

Why LLMs are stateless

Think of an LLM as an engine, not a notebook. Each API call is a clean, read-only computation: the model reads the text you send, predicts a reply, and forgets the entire exchange the moment it finishes. Inference never modifies the weights, so nothing carries over between calls (IBM, Atlan).

Everything an LLM "knows" in a given call lives inside the context window, the system prompt, the conversation history, any retrieved documents, and your input, all bundled into one request. The window is working memory, not long-term memory. When the call ends, that working memory is gone. This is the structural reason your assistant can feel like it has amnesia: it does, by design.

So the right question isn't "does the AI remember?" It's what kind of memory is bolted on outside the model, and who owns it.

How ChatGPT and Claude "remember" you anyway

Both major assistants add a memory system around the stateless model. They don't change the model, they change the prompt the model sees.

ChatGPT layers two mechanisms on top of GPT. Saved memories (launched February 2024) are an explicit, user-editable list of facts. Reference chat history (launched April 10, 2025) implicitly recalls details across past chats. In June 2026, OpenAI began rolling out "Dreaming V3", a background synthesis process that reads across your conversations and updates its model of you without prompting, replacing the hand-curated saved-memories list. OpenAI's internal evaluations report gains in factual recall, preference adherence, and accuracy over time, but those are vendor numbers, not an independent benchmark (OpenAI, TechTimes).

Claude introduced memory around September 2025 and turned it on for all users in March 2026. It works by scanning your chat history and generating a synthesized summary, refreshed roughly every 24 hours. Two real limits: searching and referencing past conversations is paid-only, and the memory lives on Anthropic's servers, it works only inside Claude (Skywork, XTrace).

Both work. The shared catch is that each system is siloed inside one product. Switch tools and you start from zero. We unpack that failure mode in why ChatGPT forgets.

Context window vs. memory

A common confusion: if 2026 models have context windows from 128K up to 10M tokens, isn't that long-term memory? No. A bigger window enlarges working memory within a single call. It gives you no persistent cross-session storage, no selective retrieval from months of history, no structured organization, and no deletion or governance.

It also degrades. The landmark "Lost in the Middle" study (Liu et al., TACL 2023) found LLM performance follows a U-shaped curve: models use information best at the start or end of a long context and worse in the middle, secondary analyses put the middle drop at over 30% (Maxim AI). The pattern has been re-validated in later long-context benchmarks. And inference cost scales quadratically with context length, so you can't simply paste your whole life into every prompt.

	Context window	Memory layer
Scope	One API call	Across sessions, indefinitely
Persistence	Gone when the call ends	Stored and retrievable
Retrieval	Everything, all at once	Selective, the relevant pieces
Organization	Flat token sequence	Structured (episodic, semantic, etc.)
Governance	None	Editable, deletable, ownable
Cost	Quadratic with length	Retrieves a small, relevant set

RAG vs. memory vs. long context

These three get conflated. Long context crams more into one prompt, useful, but transient and degrading. RAG pulls relevant documents from a static knowledge base into context at query time; it answers "what does this document say?" Memory writes new facts from your conversations and recalls those later; it answers "what do I know about you?" Memory is a form of retrieval, but it's the only one that grows from your own history. For the full breakdown, see what is an AI memory layer.

What memory should actually do

Because the model is stateless no matter what, the memory system is the durable part, and it deserves more design than "summarize the chat every 24 hours." A serious memory layer doesn't just store; it consolidates: promoting episodic events into reusable semantic facts, extracting patterns and skills, and applying salience-weighted decay so stale details fade and important ones persist. That lineage comes from cognitive science, ACT-R and Generative Agents, not from a bigger prompt.

This is the bet behind iCog. The consolidation and salience logic are deterministic algorithms; the LLM only renders language at the very end. And because iCog ships as an MCP server, the same memory layer plugs into any MCP-capable client, so your memory follows you across tools instead of locking to one vendor's app.

FAQ

Do LLMs have memory? No. The model itself is stateless and retains nothing between calls. Any "memory" comes from an external system that re-injects saved context into each new prompt.

How does AI "remember" me if the model is stateless? An application layer stores facts about you (or summarizes your history) and adds them to the prompt every time you start a new message. The model reads them fresh each call.

Is a long context window the same as long-term memory? No. A context window is per-call working memory that resets when the call ends and degrades in the middle. Long-term memory is persistent, selective, and structured.

Can I move my AI memory from ChatGPT to another tool? Not with built-in memory, ChatGPT's and Claude's memory are locked to their own apps. A vendor-neutral memory layer exposed over MCP (like iCog) is portable across clients by design.

RAG vs. memory vs. long context, what's the difference? Long context enlarges one prompt; RAG retrieves from a static document store; memory writes and recalls facts from your own conversations over time.

LLMs don't remember, so memory is always an architectural choice made outside the model. The durable, portable, inspectable version is the one worth owning. Try iCog →