Mar 2026 • Zhengyi Shen

The Art of Forgetting

Attention is finite

The instinct when AI forgets is to help it remember more. Bigger context windows. Better retrieval. More sophisticated indexing. RAG pipelines. Vector databases. Embedding layers.

This instinct is backwards.

The problem is never that AI can't hold enough information. The problem is that it holds the wrong information, or too much of it, or information that used to matter but doesn't anymore.

The paper that started all of generative AI says it in the title: Attention Is All You Need. True in a deeper sense than the authors intended. Attention is finite. Every token in the context window competes for influence over the model's output. A memory system that loads everything into context is actively harmful — it dilutes signal with noise, pushing out the information that actually helps.

Good memory is selective memory. And selective memory requires good forgetting.

Search Over Structure

How do you handle a messy database? You don't reorganize the tables. You write better queries.

Conversation history is messy by nature. Full of tangents, corrections, half-formed ideas, context-specific references that made sense at the time and are meaningless later. No amount of preprocessing will make it clean. So don't try.

Just search.

Accept noise and inefficiency in exchange for flexibility, adaptability, and zero schema friction. Don't decide in advance what's important — because you'll get it wrong, and the wrong decisions will be cemented into your index. Don't maintain an embedding layer that drifts from actual content. Just make the search good enough that the right information surfaces when you need it.

There's a lesson here from how practical AI tools actually win. The terminal beats the IDE. Simple search beats complex indexing. Not because simpler tools are better algorithms — they're worse by every computer science metric. But they're simple enough that AI uses them correctly almost every time, while complex systems introduce enough failure modes that the net result is worse. Reliability at the system level matters more than sophistication at the component level.

The Cold-Start Problem

Search sounds simple until you face the reality.

For a stateless intelligence — which has no persistent intuition about what's in its own history — every search starts cold. A human searching their email has fuzzy memory to guide them: I remember discussing this in March, the subject mentioned budgets. An LLM has no such anchor. It doesn't know what it knows.

And the fragmentation makes it worse. Different agent frameworks store conversation data differently. Session formats are incompatible. Search mechanisms vary. Your conversation history — the most valuable artifact of your work with AI — is scattered across silos that don't talk to each other.

A good memory system solves both problems. It unifies the search space across frameworks, so history is findable regardless of where the conversation happened. And it provides the cold-start intelligence: hints about what exists in the searchable space, query refinement when initial results miss, iterative exploration the same way you refine a web search when the first page doesn't have what you need.

Don't make the data smarter. Make the queries smarter. The data is a messy, append-only log spread across incompatible formats. The intelligence is in unifying the search and guiding the searcher.

Active Forgetting

Read-only search is the foundation. But a mature memory system goes further: the agent can actively manage its own working memory.

Summarize verbose exchanges into compact notes. Tag important decisions for easy retrieval. Prune context that's no longer relevant. Create structured encapsulations — cognitive compression that preserves meaning while reducing token cost.

The key distinction: the agent edits its working memory — what's currently in context — not the source of truth. The full conversation logs are never modified. Even if the agent makes bad decisions about what to keep, recovery is always possible by re-searching the raw logs.

This is forgetting as a skill, not forgetting as a failure. Deliberate compression. Active curation. The agent decides what deserves its limited attention, and the rest stays in the archive — available but not competing for influence.

Why Context Beats Coordination

If attention is the scarce resource, the worst thing you can do is split it.

The industry is converging on AI teams — multiple specialized agents coordinated by an orchestrator. The assumption is that complex tasks require multiple agents with different roles.

We think this confuses a context problem for a capability problem.

Different agents in the same framework are just different initial prompts for the same runtime. Agent A with you are a frontend developer and Agent B with you are a backend developer aren't two different intelligences. They're two different contexts. The team is a fragmented context with coordination overhead.

Every reason people reach for multi-agent orchestration reduces to a context problem. Too complex for one agent? That's a context management problem — an agent with good working memory handles far more complexity than one drowning in raw history. Need parallelism? That's independent work on separate branches, not coordination. Need specialization? That's loading a different capability, not running a separate agent. Need cheaper execution? One agent with good context that gets it right in one pass costs less than three agents with bad context that need error correction.

Price per outcome, not price per token.

The bet: one well-equipped agent with well-managed attention beats a team of poorly-equipped agents with an orchestrator. Not because one agent is smarter — because it sees the whole picture.

This doesn't mean diversity of intelligence is useless — it's profoundly valuable. But there's a crucial difference between fragmenting one context across multiple agents (which destroys information) and giving the same complete context to different minds (which multiplies perspective). We'll return to this distinction later.

The Messy Truth

Better memory won't eliminate messiness. Conversations are messy by nature. And with more efficient memory, agents will have longer sessions, generating more history. The challenge doesn't shrink — it shifts.

The answer remains: search over structure. Better queries over cleaner data. Attention over accumulation. Selective forgetting over total recall.

← Previous Trust and Costless ReversibilityNext → Designing for the Grain