Agentic Memory

There’s been significant research in the past year about how AI agents should manage memory — what to store, how to retrieve it, when to forget, and how to evaluate whether any of it actually helps.

This research thread is my exploration of that space. The goal is to build up from first principles: start with an evaluation environment that can measure memory performance, then use it to test different memory architectures and strategies.

Key questions driving this research:

What does “good memory” look like for an agent? How do we measure it?
How do different memory architectures (episodic, semantic, procedural) map to agent use cases?
What’s the right granularity for memory — conversations, facts, preferences, skills?
When does memory hurt rather than help? (stale context, conflicting memories, context pollution)
How do we evaluate memory in a reproducible way?

Journal