Designing Agent Memory: Short-Term, Long-Term, Episodic & Semantic
How to architect memory for AI agents that need to learn from past interactions. Short-term context windows, long-term vector memory, episodic memory, and semantic distillation patterns.
Memory is the Hardest Agent Problem
LLMs are stateless. Every "memory" your agent has is something you explicitly retrieve and inject into its context. Designing that retrieval well is what separates a chatbot from an actual assistant.
I split agent memory into four layers, each with its own storage and access pattern.
Layer 1: Short-Term (Working Memory)
The last N turns of the current conversation. Lives in Redis with a TTL.
async def append_turn(thread_id: str, role: str, content: str):
await redis.lpush(f"thread:{thread_id}:turns",
json.dumps({"role": role, "content": content, "ts": time.time()}))
await redis.ltrim(f"thread:{thread_id}:turns", 0, 19) # Keep last 20
await redis.expire(f"thread:{thread_id}:turns", 86400) # 24h TTL
Layer 2: Long-Term (Semantic Memory)
Facts about the user, distilled across all their sessions. Stored as embeddings.
async def extract_facts(thread_id: str):
turns = await get_turns(thread_id)
facts = await llm.extract_structured(
prompt=FACT_EXTRACTION_PROMPT,
text="\n".join(t["content"] for t in turns),
schema=FactList,
)
for fact in facts.facts:
await qdrant.upsert("user_facts", [{
"id": uuid4().hex,
"vector": await embed(fact.statement),
"payload": {
"user_id": user_id, "fact": fact.statement,
"confidence": fact.confidence, "source_thread": thread_id,
"ts": time.time(),
},
}])async def recall_facts(user_id: str, query: str, k: int = 5) -> list[str]: qv = await embed(query) results = qdrant.search("user_facts", qv, query_filter={"must": [{"key": "user_id", "match": {"value": user_id}}]}, limit=k) return [r.payload["fact"] for r in results] ```
Layer 3: Episodic Memory
Specific past events with timestamps — "Last Tuesday we discussed X". Used for temporal recall:
class Episode(BaseModel):
user_id: str
summary: str
happened_at: datetime
participants: list[str]
outcome: str | None
Stored in Postgres (not vector DB) because temporal queries dominate semantic ones.
Layer 4: Procedural Memory
Patterns the agent has learned about how to do its job. This is your evolving system prompt, examples library, and tool selection heuristics.
SUCCESSFUL_PATTERNS = await db.fetch("""
SELECT pattern, success_rate
FROM agent_patterns
WHERE task_type = $1 AND success_rate > 0.85
ORDER BY success_rate DESC LIMIT 5
""", task_type)
Inject these as few-shot examples in the system prompt.
Putting It Together: Context Assembly
async def build_context(thread_id: str, user_id: str, current_query: str) -> str:
short_term = await get_turns(thread_id, n=10)
long_term = await recall_facts(user_id, current_query, k=5)
episodic = await recall_episodes(user_id, current_query, k=2)
patterns = await get_patterns(task_type)return f""" [User Facts] {format_facts(long_term)}
[Recent Episodes] {format_episodes(episodic)}
[Conversation So Far] {format_turns(short_term)}
[Current Query] {current_query} """.strip() ```
Privacy Hygiene
- Source (which session/document?)
- Confidence (how sure are we?)
- Expiration (when does this become stale?)
- User-controlled deletion (one click forgets everything)