Resource / scorecard
Score agent drift in five minutes.
Decide whether today’s agent behavior is stable, watch-worthy, or blocked before you change prompts or tools.
Lead resource
Updated 2026-06-01
Founder-CTOs and agency teams reviewing agent quality across repeat sessions.
1. Score behavior
Judge memory, repo awareness, tool visibility, latency, safety, style, and instruction following instead of relying on feel.
2. Clear verdict
Mark the session as proceed, rerun, repair setup, or accept a new Good Baseline.
3. Concrete support
Use Baseline’s health score, status, warning count, duration, checks, and run id as evidence.
Use this today
What you get
- Health score changed materially.
- Warning count increased.
- Memory or repo checks failed.
- MCP tool surface changed.
- Duration jumped from recent runs.
Lead magnet
Request the worksheet follow-up and 7-day pilot prompt.
The checklist stays on this page. Leave an email if you want the follow-up prompt and a quick note on where this resource fits in your agent workflow.
