Agent Improvement Review Checklist
A practical checklist for keeping AI agents reliable after launch: reconcile runtime drift, source-of-truth drift, stale routines, missing interfaces, skill bloat, approval boundaries, and follow-up actions.
The first version of an AI agent is usually cleaner than the tenth week of using it.
Prompts change. Cron jobs move. Files get renamed. Ownership becomes informal. Skills multiply. A one-time workaround becomes a hidden dependency. A runtime setting drifts away from the source-of-truth documentation. The agent still runs, so nobody notices that the operating model is no longer true.
That is how reliable agents become mysterious agents.
The fix is not another launch checklist. It is a recurring improvement review.
The failure pattern: configured once, trusted forever
Teams often treat agent setup as a project with a finish line.
They define the prompt, connect tools, add a schedule, ship the workflow, and assume the agent will keep behaving according to the original design.
But agents live inside changing systems:
- the business goal changes;
- the owner changes;
- the source-of-truth file moves;
- the schedule keeps running with old language;
- the approval boundary gets more complicated;
- the runtime state no longer matches the runbook;
- a useful local procedure never gets promoted into durable documentation;
- a stale task keeps appearing because nobody closed the loop.
This is not a reason to avoid agents. It is a reason to manage them like operating capability instead of static automation.
The LifeOS lesson
In LifeOS, scheduled reviews are not just reminders. They are reconciliation loops.
A review can compare what the runtime is doing against what the durable system says should be happening. It can notice stale cron metadata, missing interface documentation, unclear routing, duplicate skills, old tasks, or a gap between "what we think the agent owns" and "what the agent is actually doing."
The important shift is this:
Runtime state is not the source of truth. It is evidence to reconcile against the source of truth.
If the runtime changed, the source of truth may need an update. If the source of truth changed, the runtime may need a fix. If both changed and nobody knows why, the system needs a dated decision or event note before more automation is added.
When to run an agent improvement review
Run this review:
- weekly for active personal or business operating agents;
- after a failed or surprising run;
- before adding a new integration;
- after changing schedules, prompts, tools, memory policy, or routing;
- before trusting an agent with higher-consequence work;
- whenever the human starts saying, "I thought the agent already knew that."
The review should be small enough to run often. Its job is not to redesign the entire system. Its job is to find drift while it is still cheap.
The agent improvement review checklist
# Agent Improvement Review
## 1. Runtime drift
- What jobs, tools, routes, prompts, or integrations are currently active?
- Does runtime behavior match the documented intent?
- What changed since the last review?
- What is running that nobody still wants?
- What expected job or tool is missing, paused, failing, or stale?
## 2. Source-of-truth drift
- Where is the durable source of truth for this agent?
- Do runbooks, routines, tasks, and decisions describe current reality?
- Are important changes only present in chat history, logs, or local edits?
- Are there uncommitted or unpushed source-of-truth updates?
## 3. Ownership and routing
- Who owns the outcome this agent supports?
- Which system or capsule owns its durable context?
- Where should new tasks, decisions, events, and skills be written?
- Are ambiguous items going to an inbox instead of being routed to the owner?
## 4. Interface documentation
- How does the human interact with the agent?
- What platforms, channels, commands, or files are part of the interface?
- Are failures and delivery errors visible?
- Is the human-facing output still useful and concise?
## 5. Approval boundaries
- What may the agent do without asking?
- What may it draft but not execute?
- What always requires human approval?
- Did any recent run approach a boundary too casually?
- Are send, publish, spend, delete, push, and production-change gates explicit?
## 6. Context and memory hygiene
- What durable context was added recently?
- What is stale, too detailed, duplicated, or sensitive?
- What belongs in a source file instead of hot memory?
- What should be removed, summarized, or routed elsewhere?
## 7. Skill and procedure bloat
- Which reusable procedures did the agent actually use?
- Which skills are duplicated, stale, too narrow, or missing pitfalls?
- Should a local skill stay local, be patched, or be promoted after reuse?
- Did a hard-won workaround remain trapped in one session?
## 8. Outcome quality
- Did the agent advance the intended outcome?
- What did the human still have to restate?
- Which recommendations were ignored, acted on, or wrong?
- What evidence would improve the next run?
## 9. Follow-up action IDs
- What exact fixes should happen next?
- Who owns each fix?
- Where will completion be recorded?
- Which fixes are safe now, and which need explicit approval?
What good looks like
A good review produces a small set of changes:
- one stale task closed;
- one routine updated;
- one missing interface documented;
- one approval boundary clarified;
- one source-of-truth file brought back in sync;
- one runtime change postponed because the operating reason is unclear.
That may sound boring. Boring is the point. Reliable agents are maintained through many small reconciliations, not dramatic rebuilds.
One action this week
Pick one recurring agent job or AI workflow.
Do not improve the prompt first. Run the checklist first and answer only three questions:
- What is the runtime doing that the source of truth does not describe?
- What does the source of truth expect that the runtime is not doing?
- What approval boundary is still implicit?
If you cannot answer those, the next improvement is not a new model, tool, or automation. The next improvement is reconciliation.
For the broader cadence around this work, read The Weekly AI Operating Review That Keeps Sprawl From Coming Back. If your agents are producing workflow artifacts that may become external action, pair this review with a send gate before expanding automation.