sample/api92/100
watch . +1
Last 13 runs
identitymodel / context / goal matched
mcp.configdrift from clean-local
latency.tool+312ms over 5-day median
baseline.
Local CLI + MCP drift checks
Baseline probes OpenClaw, Codex, Hermes, or any approved local runner and compares each run to a known-good baseline, so you catch model, tool, memory, repo, and latency drift before it burns a work session.
Copy and run
curl -fsSL https://trackbaseline.com/install.sh | sh
baseline setup
baseline run --mode fast
baseline report RUN_ID
baseline accept RUN_ID --confirm "accept RUN_ID" --label clean-local
baseline compare
Sample data . after three workstations sync
The dashboard is an example of what redacted Pro history can show after local runs start syncing. Your raw prompts, outputs, and repo paths stay on the workstation by default.
sample/api92/100
watch . +1
Last 13 runs
identitymodel / context / goal matched
mcp.configdrift from clean-local
latency.tool+312ms over 5-day median
sample/web97/100
ok . +2
Last 13 runs
identitystable for 14 runs
memorycontext survives across calls
safety.scrubberno leaks detected
sample/infra78/100
fail . -9
Last 13 runs
memorycontext lost mid-session
variance2 of 5 prompts diverged
repo.workspacedirty: 11 unstaged files
Run Baseline daily
Baseline stores a local SQLite history, writes report artifacts, and tells you exactly which behavior changed: tool visibility, repo awareness, memory carryover, instruction following, safety scrub, or latency.
The default set
Model, provider, context window, primary goal.
v0.1Workspace path, clean/dirty state, branch.
v0.1MCP server reachable, allowed tool surface declared.
v0.1Carried context survives between calls. Same answer twice.
v0.1P50 and P95 against the 5-day local median.
v0.1Five identical prompts. Five identical answers.
v0.1Scrubber catches paths, secrets, prompt fragments.
v0.1Repository conventions: tabs, naming, file layout.
v0.1Reports any tool / MCP / repo / config change since Good.
v0.1Two-plus-two. Date. The smoke test for a broken model.
v0.1Answer only the word. Answer only the number.
v0.1Redacted summaries verify against original on push.
v0.1Tool calls actually fire. Names match the declared surface.
v0.1A second session in the same workspace agrees with the first.
v0.1Four commands
The first hour is deliberately boring: install the binary, run the probes, read the report, accept a clean run, then compare future sessions against that standard. MCP tools let agents trigger the same loop without making the cloud required. New to the ritual? Start with the Good Baseline workflow.
01$ baseline setup
02$ baseline run --mode fast
03$ baseline accept RUN_ID --confirm "accept RUN_ID" --label clean-local
04$ baseline compare



Pricing
Baseline should prove itself on one workstation before you pay. The free local loop catches drift immediately; Pro and Team turn those local reports into retained history, alerts, and account-scoped evidence. Compare options in the local-first observability guide or use the agency monitoring playbook.
First workstation
$0
When history matters
$39/mo
When reviews need to route
$129/mo
7-day pilot
Request a 7-day setup pilot before paying. We will reply by email, confirm Pro ($39/mo) or Team ($129/mo) pricing before anything is billed, then send the invite, magic link, workspace-token setup path, and first Good Baseline checklist.
Field notes
The five-minute review ritual before a run becomes the standard your workstation compares against.
Read → 2026 . 04 . 28 MCP drift looks like nothing, until it costs a day.The quiet config and tool-surface failures that do not show up in trace dashboards.
Read → 2026 . 04 . 09 The case against a leaderboard.Why Baseline measures this workstation against its own clean run, not models against each other.
Read →Copy the installer, run a fast baseline, and only accept the run after you read the report. That is enough to catch the next quiet drift.