I Built an AI Memory System That Forgets on Purpose. It Remembers Better Than Yours.

My AI agent wrote a perfect architecture spec. Dual-storage design, separate searchable content from raw episodes, clean separation of concerns. Textbook engineering.

Then it ignored the entire thing and built a flat table instead.

I didn't catch it. Tests were passing. Code was shipping. The velocity felt like progress. It wasn't.

Seven patches and a funeral

The first bug was subtle. Search queries returned results, but the results were garbage. Heartbeat pings. System prompts. Tool call metadata. The agent was remembering everything and understanding nothing.

Patch one: content filter. Strip out obvious noise before storing. Scores improved slightly. Not enough.

Patch two: searchable_content column. Store a cleaned version alongside the raw episode. Better, but still noisy.

Patch three: RRF score normalization. Maybe the ranking was off? Tweaked the fusion weights. Marginal improvement.

Patches four through seven: query expansion, HyDE fallback, role-based scoring, threshold adjustments. Each one felt like a breakthrough in isolation. Each one was a bandaid on a broken bone.

After seven patches, I profiled the actual data. 14,000 episodes. 87% noise. Heartbeats, tool calls, status updates, system metadata. The real signal (conversations, decisions, learned facts) was 13% of the total. Retrieval scores sat between 0.01 and 0.03 against a 0.15 threshold. The pipeline was silently dropping every single result.

I had spent weeks tuning search over a landfill.

Here's the thing about patching. It creates the illusion of forward motion because you're writing code, shipping changes, watching numbers tick up by fractions. But when the foundation is wrong, you're just rearranging furniture in a burning building. I should have caught it at patch two. I caught it at patch seven. Five patches of wasted time because passing tests felt like progress.

That's when I stopped reading engineering blogs and started reading neuroscience papers.

Memory is not storage

Your brain processes roughly 11 million bits of sensory input per second. You consciously experience about 50. The other 10,999,950 bits get filtered before you ever notice them.

This isn't a limitation. It's the entire design.

During sleep, your hippocampus replays the day's events and runs a brutal triage. Emotional moments get reinforced. Repeated patterns crystallize into skills. Contradictory information gets flagged. Random noise gets pruned. By morning, you don't remember every conversation from Tuesday. You remember the one that mattered.

The forgetting curve (that exponential drop-off where you lose 70% of new information within 24 hours) looks like a flaw in human cognition. It's actually a compression algorithm. By letting unimportant memories fade, the important ones become easier to find. The noise floor drops. The signal gets louder.

Every AI memory system I've seen ignores this. They treat memory as a storage problem. Store more, embed better, search smarter. It's a filing cabinet with a fancier index.

But memory isn't storage. Memory is a process. And the most important part of that process is forgetting.

Building a brain, not a database

Engram models five cognitive memory systems. Not as a taxonomy exercise, but because each one plays a different role in how raw experience becomes understanding.

A conversation turn enters the sensory buffer. Working memory. The last hundred or so items, volatile and cheap. What's on the whiteboard right now.

If it passes the salience filter (goodbye, heartbeat pings), it gets written to episodic memory. Raw, lossless ground truth. This is where every flat-table approach stops. Engram doesn't.

Consolidation runs. Light sleep compresses thirty conversation turns about debugging a migration failure into a single digest: "Root cause was ORM generating CREATE INDEX before the table existed. Fix: reorder migration steps." Deep sleep extracts durable knowledge from that digest into semantic memory: "ORM migration ordering matters for index creation." And into procedural memory: "When migrations fail, check generated SQL order before blaming the schema."

Then the dream cycle fires. It discovers that this migration issue connects to a conversation about database locking from two weeks ago. A new edge forms in the associative network, linking two experiences that happened weeks apart across different projects. Eight edge types (temporal, causal, topical, supports, contradicts, elaborates, derives_from, co_recalled) let the network model how memories relate to each other. Next time the agent encounters a locking problem, the migration context gets activated too. Not because someone programmed that rule. Because the dream cycle discovered the connection.

Finally, the decay pass. That random aside about lunch preferences from three weeks ago? Its confidence score has been dropping with each cycle. Now it's below the retrieval floor. Still in the database (lossless design), but it will never surface in a search again.

The noise doesn't get deleted. It gets forgotten. The way your brain forgets: by making it inaccessible while the important memories get stronger.

After running consolidation on that 14,000-episode disaster, retrieval scores jumped from 0.01 to 0.6+. Same data. Same queries. The architecture made the difference, not the search algorithm.

The agent that stopped making the same mistake

Architecture diagrams are nice. Let me show you what actually changes.

Before Engram, I had a coding agent that debugged a race condition in an authentication module on a Tuesday. By Thursday, same class of bug in a different module. The agent started from scratch. It had conversation history (thousands of turns of it), but thousands of turns is noise, not knowledge. There was no mechanism to extract the pattern from the raw experience.

After Engram, three sessions of debugging race conditions get consolidated. The episodes compress into a digest. Deep sleep extracts a procedural memory: "When multiple middleware functions modify the session object, check for race conditions." The associative network links it to a semantic memory about shared mutable state.

Fourth time a similar pattern appears, the agent recalls the procedural memory before it even starts debugging. It doesn't search through old conversations. It doesn't need RAG context stuffing. It knows. The way you know to look both ways before crossing a street. Not because you're retrieving a specific memory of a specific near-miss, but because hundreds of experiences consolidated into instinct.

This scales beyond a single agent. Agent A discovers that an API endpoint returns inconsistent pagination. After consolidation, that becomes a semantic memory in the shared Supabase-backed store. Agent B, working on a completely different feature that uses the same API, recalls the warning before hitting the same wall. No message passing. No explicit coordination. Shared memory. The kind of institutional knowledge that usually lives in a senior engineer's head and walks out the door when they leave.

The uncomfortable part

I should say something about how Engram came to exist, because the marketing version of the story ("I was inspired by neuroscience!") isn't the real one.

The real story is that I failed. I built the wrong architecture, ignored a correct spec that my own agent had written, and spent weeks patching a system that was broken at its core. The agent helping me build the memory engine literally documented the right answer (dual-storage, separate searchable content from raw episodes), then proceeded to build the wrong thing. And I let it happen because the tests were green and the commits were flowing.

Seven patches. Each one a little victory. Each one hiding the fact that the foundation was rotten.

The neuroscience research happened because I was out of engineering ideas, not because I had some grand vision. I was desperate. And desperation led me to papers about hippocampal replay and memory consolidation that I would never have read if the flat-table approach had worked well enough to ship.

Engram exists because a failure was so complete that the only option was starting over from first principles. I'm not proud of the failure. But I'm honest about it because the lesson matters: when you're on your third patch for the same class of bug, the architecture is wrong. Stop patching. Burn it down. Rebuild.

Get started

Three lines. No API keys for the basic setup.

javascript

import { createMemory } from '@engram/core'
import { sqliteAdapter } from '@engram/sqlite'

const memory = createMemory({ storage: sqliteAdapter() })
await memory.ingest({ role: 'user', content: 'I prefer TypeScript over JavaScript' })
const result = await memory.recall('What languages does the user like?')

For Claude Code, add the MCP server and your agent gets memory_recall, memory_ingest, and memory_forget tools. Every session builds on the last one.

Start with SQLite and BM25. Add OpenAI embeddings when you need vector search. Move to Supabase when you want agents sharing memory across machines. Full cognitive engine with auto-consolidation at the top tier. The architecture scales up, but the API stays the same three functions.

Build agents that actually learn

Engram is open source under Apache 2.0 at github.com/muhammadkh4n/engram.

If your AI agents forget everything between sessions, they don't have a memory problem. They have a forgetting problem. They're not forgetting enough of the right things, and they're not strengthening enough of the important ones. Stop building filing cabinets. Give them a brain that consolidates, associates, and prunes. The forgetting is what makes the remembering work.

I Built an AI Memory System That Forgets on Purpose. It Remembers Better Than Yours.

Seven patches and a funeral

Memory is not storage

Building a brain, not a database

The agent that stopped making the same mistake

The uncomfortable part

Get started

Build agents that actually learn

Get new posts in your inbox

Keep reading

I Compared Three AI Memory Systems. They Can't Even Agree on What Memory Means.

I Closed the AI Agent Loop. They Stopped Making the Same Mistakes.

AI Agents Can Write Code. Nobody Is Managing Them. I Built the Missing Layer.