Overview
The problem
When your team commits code, git captures the change. When your agent explores your codebase to make that change — the files it read, the paths it explored, the prompts that led to specific edits — nothing captures that. It's gone when the session ends.
This matters because the exploration is where the context lives. An agent that spent three sessions learning that distiller.ts and task-context.ts always change together, that globals.css is the right place for design token changes, that the MCP server wires through server.ts — that agent is meaningfully faster and more accurate than one starting from scratch. But there's no artifact that carries that knowledge forward.
The existing tools don't fill this gap:
- Git tracks what changed, not what was read or explored to get there.
- CLAUDE.md and .cursor/rules capture what you declare — conventions you remember to write down, instructions you decide to give. They don't capture behavioral patterns: which files actually get touched together, which tasks reliably land on which parts of the codebase, what the agent explored before finding the answer.
- First-party memory (Cursor Memories, Windsurf memories) captures some agent behavior — but it's per-tool, stored in a proprietary system, and invisible outside that tool. It can't be committed, diffed, reviewed in a PR, or read by a different agent. If someone on your team uses Claude Code while you use Cursor, their sessions don't inform yours.
There's no shared source of truth for agent behavior across a codebase. No committed artifact that says "here's what agents actually did, which files they touched together, which tasks led to which edits, across every session and every agent." moatlog is that artifact.
What Moatlog does
Moatlog captures what agents actually do — not what you tell them to do. Hooks fire on every file read, write, prompt, and shell command. The raw events get distilled into moat.json: a compact, committed file that any agent can read before starting work.
Three properties make this different from existing approaches:
- Observed, not declared. moat.json reflects real agent behavior — which files get touched together, which tasks led to which edits, how file importance changes over sessions. You don't maintain it manually. The more you work, the more accurate it gets.
- Git-native. moat.json lives in your repo, committed alongside your code. It's diffable, PR-reviewable, and auditable. Team members get behavioral context without replaying your session history. No external service required.
- Agent-agnostic. One moat.json, any agent. Cursor, Claude Code, and Devin all write to the same event log and read from the same moat. When you switch agents mid-project, context comes with you. When a team member uses a different tool, they still benefit from your sessions.
npx moatlog initScaffold hooks, MCP config, rules, and .moatlog/ in your project
Retrieval quality
moatlog includes moatlog eval — an offline retrieval quality harness that measures whether get_task_context returns the right files for historical tasks. It uses leave-one-out evaluation: for each high-quality prompt window, hide it from the moat, run retrieval with its task description, check whether the returned files match what the agent actually touched.
Current results on this repo (moatlog itself, 20 sessions):
hit rate (top 5): 64%
baseline hit rate: 75% (naive: always return the 5 hottest files)At 20 sessions, task-aware retrieval trails the naive baseline. The failure cases are concentrated in styling tasks where the agent edited app-level files (globals.css, layout.tsx) but the task description doesn't name them — retrieval can't make that connection without either explicit file paths in the task or enough sessions to build the association naturally.
Where retrieval works well: tasks that reference specific files or packages by name return those files accurately. Cross-file retrieval via co-access works once support counts are established — "fix the distiller" reliably surfaces task-context.ts alongside distiller.ts.
The gap closes as sessions accumulate. Each session adds new keyword→file associations to taskFileSets, and co-access support counts get confirmed across more windows. The naive baseline's advantage weakens as more files become genuinely hot and retrieval can distinguish between them based on task context rather than raw frequency.
v0.1.2 adds moatlog benchmark --api — cold vs. warm agent sessions on a fixed task suite, measuring token usage and file-exploration differences directly. The eval harness measures retrieval quality; the benchmark will measure agent behavior.
moatlog eval --baselineRun leave-one-out retrieval eval on your moat.json
Built with moatlog
moatlog was built using moatlog. Every session that wrote the schema, fixed the attribution bug, built the docs renderer, and designed the merge strategy is captured in .moatlog/moat.json in this repo.
The proof section on the landing page shows real current output of moatlog report on this repo. The moat.json file is in the repository — you can read it directly and verify that the numbers match what moatlog status and moatlog report show.
Run moatlog eval on your own repo to see retrieval quality on your own codebase. The numbers will differ — a codebase where tasks consistently name specific files will see higher hit rates earlier. A codebase with more implicit, description-only tasks will start lower and improve more slowly.
Who this is for
moatlog is most useful if:
- You use AI agents heavily enough that cold-start friction is genuinely annoying
- You use multiple agents and want shared context between them
- You work on a team where behavioral memory across developers would help
- You care about measuring whether your tooling actually helps, not just whether it feels like it does
The setup cost is low — moatlog init takes about a minute. The payoff is proportional to how much you use agents and how complex your codebase is. A small repo worked on occasionally won't accumulate enough moat data for retrieval to outperform cold exploration. A large repo with heavy agent use is where the gap compounds.