Baseline field notes

Field notes for agent operators.

Baseline is for the moment after an agent seemed fine yesterday and feels different today. These notes explain the local loop, the failure modes it catches, and why the product measures your own workstation instead of ranking models in public.

2026 . 05 . 14

How to accept a Good Baseline.

A Good Baseline is not the first run that finishes. It is the run you are willing to compare future work against. Start with baseline setup, run baseline run --mode fast, then open the report and check the boring things: correct workspace, expected agent identity, reachable MCP tools, clean scrubber output, and no surprising latency jump.

Only then accept it with baseline accept RUN_ID --confirm "accept RUN_ID" --label clean-local. From that point on, baseline compare has a real reference point.

2026 . 04 . 28

MCP drift looks like nothing, until it costs a day.

The painful agent failures rarely announce themselves. A server disappears from the tool list. A config file points at a stale workspace. The model can still chat, but it no longer sees the same repo, memory, or local tools it had during the clean session.

A local run creates a redacted report before the work starts. Pro history helps when the same warning repeats across days or machines, but the useful habit is smaller: run the line call before important sessions and fix the workstation before you trust the agent.

2026 . 04 . 09

The case against a leaderboard.

Baseline does not try to prove that one model is better than another in the abstract. Your risk is local: this agent, in this repo, with these tools, under today's config.

The score is a workstation health signal, not a trophy. Use it to decide whether to proceed, repair, or rerun. When a run is clean, accept it. When it drifts, investigate the exact probe that changed.

Guides

Guide / 2026-06-01How to run a coding agent health check before work drifts.

A practical coding agent health check for memory, repo awareness, MCP visibility, latency, safety, and style drift.

Read →Guide / 2026-06-01Detect agent drift before it becomes a lost day.

Detect coding agent drift across memory, tools, latency, safety, and style before it costs a development day.

Read →Guide / 2026-06-01Run an MCP server health check without adding tool sprawl.

Verify MCP server visibility, tool count, setup, scrub preview, and recovery paths for local coding agents.

Read →Guide / 2026-06-01Monitor OpenClaw from the workstation outward.

Monitor OpenClaw coding agent health with local Baseline runs, Good Baseline acceptance, and MCP recovery checks.

Read →Guide / 2026-06-01Monitor Codex sessions with a local known-good loop.

Monitor Codex coding-agent sessions with local health checks, MCP setup, drift reports, and known-good run comparison.

Read →Guide / 2026-06-01The Good Baseline is a review ritual, not a score badge.

A step-by-step Good Baseline workflow for accepting, comparing, and updating known-good coding agent runs.

Read →Guide / 2026-06-01Catch memory regression when it is still subtle.

Identify AI agent memory regression with repeated local probes, project-awareness checks, and known-good comparison.

Read →Guide / 2026-06-01Agent observability should start on the workstation.

A local-first approach to coding agent observability that keeps raw prompts local while syncing redacted run history when needed.

Read →

Checklists and templates

Resource / 2026-06-01The coding agent health checklist.

A practical checklist for checking coding agent health before a high-stakes work session.

Read →Resource / 2026-06-01Score agent drift in five minutes.

A scorecard for judging whether coding agent behavior has drifted from a known-good baseline.

Read →Resource / 2026-06-01The MCP debugging cheatsheet for agent workstations.

A cheatsheet for debugging missing MCP tools, broken local CLI setup, and coding-agent server drift.

Read →Resource / 2026-06-01Review before you accept the Good Baseline.

A review template for deciding whether a coding-agent run deserves to become the Good Baseline.

Read →Resource / 2026-06-01Monitor agency agent workstations without leaking client work.

A playbook for agencies monitoring multiple coding-agent workstations without exposing raw client prompts.

Read →