The 7 Types of Agent Memory: A Technical Guide for AI Engineers

Large language models are stateless by default. Each API call starts fresh. The model forgets your last message once the response returns. That is fine for a single question. It breaks the moment you build an agent.

Agents plan, call tools, and run across many steps. They need to remember. Memory is the infrastructure that fixes this. It turns a stateless model into a system that retains context. That system can learn from experience and act over time.

What is Agent Memory

Memory is any mechanism that carries information across a model’s reasoning. Some of it lives inside the context window. Some of it lives outside, in databases or model weights. Each type stores a different class of information for a different duration.

Memory varies by form and by time. Form means parametric, stored in weights, or non-parametric, stored as text. Time means short-term or long-term. The seven types below map onto those two axes.

The Seven Types of Agent Memory

1. In-Context / Working Memory (Short-Term): This is everything the model can currently see inside its context window. It includes the system prompt, recent messages, tool outputs, and reasoning steps. Think of it as RAM. It is fast and essential, but temporary and size-limited. Every other memory type competes for space here.

2. Semantic Memory (Long-Term): This is a persistent store of facts, preferences, and domain knowledge. It holds entries like “the user prefers Python over JavaScript.” The knowledge is decoupled from when it was learned. It is the agent’s organized encyclopedia about a user or topic.

3. Episodic Memory (Long-Term): This logs specific past events, full conversations, and task runs. It records what worked and what failed. The agent uses it to learn from experience. Systems like Reflexion and ExpeL write verbal post-mortems and store conclusions for future runs.

4. Procedural Memory (Long-Term): This is the agent’s knowledge of how to do things. It covers skills, tool usage patterns, workflows, and behavioral rules. A support agent handling its hundredth password reset does not re-reason the workflow. It executes a learned procedure instead.

5. External / Retrieval Memory (Short-Term + Long-Term): This is knowledge stored outside the model in a vector database. It is pulled into context at inference time using similarity search. This is RAG applied to agent history or documents. Retrieval quality becomes the bottleneck fast.

6. Parametric Memory (Long-Term): This is knowledge baked directly into the model’s weights during training. It holds language, reasoning patterns, and general world knowledge. The model does not look anything up. It generates from learned associations. The tradeoff is that this memory is frozen at training time.

7. Prospective Memory (Short-Term + Long-Term): This is the agent’s ability to remember future intentions and scheduled goals. It tracks things the agent planned but has not yet executed. It is critical for long-horizon and multi-step planning agents. Without it, an agent forgets its own commitments.

Side-by-Side: How the Seven Compare

The table below maps each type to its timescale, location, and typical implementation.

Memory type Timescale Where it lives What it stores Common implementation
Working / In-context Short-term Context window Prompt, messages, tool outputs Native to the LLM
Semantic Long-term External store Facts, preferences, domain knowledge Vector DB or profile schema
Episodic Long-term External store Past events, task runs, outcomes Vector DB plus event logs
Procedural Long-term Prompt or weights Skills, workflows, behavioral rules System prompt or fine-tune
Retrieval / External Both Vector database Documents, history chunks RAG pipeline
Parametric Long-term Model weights World knowledge, language, reasoning Pre-training or fine-tuning
Prospective Both State store Future intentions, scheduled goals Task queue or scheduler

Interactive Explainer

Use Cases: Which Memory Solves Which Problem

Each type answers a concrete product need. Map the need to the memory.

  • A coding assistant inside one session uses working memory. It tracks the open files and recent edits in context. Close the session and that state is gone.
  • A personal assistant that remembers you needs semantic memory. It stores “allergic to gluten” and recalls it next week. The fact survives across sessions.
  • A research agent that improves over time needs episodic memory. It recalls that risk sections landed well last month. It repeats what worked and avoids what failed.
  • A travel-booking agent needs procedural memory. It knows the flow: search flights, compare, reserve, confirm. The sequence is a learned skill, not a fresh plan.
  • A documentation chatbot needs retrieval memory. It embeds the docs and pulls relevant chunks per query. The answer stays grounded in retrieved text.
  • A long-horizon agent managing a week-long project needs prospective memory. It remembers to send the Friday report. The intention persists until execution.

A Combined Example: All Seven in One Agent

Consider an autonomous market-analysis agent. One task exercises every memory type at once.

Parametric memory supplies the base reasoning and language. Retrieval memory pulls current market data from a vector store. Semantic memory provides the user’s preferred report format. Episodic memory recalls which sources proved reliable before. Procedural memory drives the section order: sizing, then landscape, then risk. Prospective memory schedules the follow-up draft for next week. Working memory assembles all of it into the active context.

Remove any one layer and the agent gets weaker. Each handles a job the others cannot.

Implementation: A Minimal Memory Stack

Here is a stripped-down sketch in Python. It shows working, semantic, episodic, and procedural memory as separate stores.

from datetime import datetime

# Semantic memory: durable facts about the user
semantic_memory = {"diet": "vegetarian", "language_pref": "Python"}

# Episodic memory: a log of past events and outcomes
episodic_memory = [
    {"timestamp": datetime.now(),
     "event": "recipe_request",
     "result": "user liked a 20-minute meal"},
]

# Procedural memory: skills the agent can execute
def suggest_recipe(diet):
    return f"a quick {diet} recipe"

procedural_memory = {"suggest_recipe": suggest_recipe}

# Working memory: assembled fresh for each inference call
def build_context(query):
    diet = semantic_memory["diet"]
    last = episodic_memory[-1]["result"]
    skill = procedural_memory["suggest_recipe"]
    return (
        f"Query: {query}n"
        f"Semantic: user is {diet}n"
        f"Episodic: last time, {last}n"
        f"Procedural: returning {skill(diet)}"
    )

print(build_context("suggest dinner"))

In production, the long-term stores move to a vector database. The pattern stays the same. Write to long-term memory, retrieve into working memory, then reason.

How to Layer Them: A Practical Build Order

Do not build all seven at once. Add memory only when a real need justifies the complexity.

  • Start with working memory. It ships with the model. Most simple agents need nothing more.
  • Add semantic memory when users expect the agent to remember them across sessions. This is the first long-term layer most products require.
  • Layer in episodic, procedural, and prospective memory later. Add them only when your agent must plan ahead, learn from failure, and adapt over time.
  • Parametric and retrieval memory are often already present. Parametric memory is the base model itself. Retrieval memory arrives the moment you add RAG.


Sources: CoALA framework (Princeton, arXiv:2309.02427); “Memory in the Age of AI Agents” survey (arXiv:2512.13564); “From Human Memory to AI Memory” survey (arXiv:2504.15965); LangChain LangMem, MongoDB, Redis, and Neo4j agent-memory documentation; original concept notes on the seven memory types.

The post The 7 Types of Agent Memory: A Technical Guide for AI Engineers appeared first on MarkTechPost.