Catching UI Drift From AI Coding Agents Before It Ships
Ask Claude Code for a cancel button and you'll get a cancel button. You might also get a billing card that's 40px taller, a footer pushed below the fold, and a dark-mode style that quietly stopped applying. The tests pass, the types check out — the agent just never looked at the page. Developers have started calling this UI drift, and if you lean on agents for frontend work, you've almost certainly shipped some.
What is UI drift in agentic coding?
UI drift is the buildup of unintended visual changes that AI coding agents introduce as side effects of the changes you actually asked for. Each one is small — a margin collapse here, an overflowing flex container there — and each one sails through review because the diff looks plausible as code. None of them is worth a revert on its own. But after a few dozen agent sessions the UI has ended up somewhere nobody chose, and there's no single commit to blame.
A few flavors you've probably met:
- Shared-component blast radius. The agent edits
Card.tsxto fix the page you asked about and quietly reflows the other eleven pages that use it. - CSS cascade side effects. A new utility class or a reordered import changes specificity, and an unrelated selector starts winning.
- Dark mode and responsive states. The agent checks its work against the markup it wrote, not against the page at 375 px wide in dark mode.
- Refactors that "shouldn't change anything." Extracting a layout component is exactly the kind of change agents do confidently — and exactly the kind that shifts every descendant by a few pixels.
Why AI coding agents can't see what they break
An agent's feedback loop is text in, text out. It checks its work with whatever tools you give it — a compiler, a type checker, ESLint, a test runner — and all of those operate on source code. None of them render anything. A component can be type-safe, lint-clean, and fully covered by passing tests, and still look broken the moment someone loads the page.
Humans close this gap without thinking about it: save, alt-tab, squint, fix. Agents don't have the reflex, and usually don't have the eyes either. So catching visual regressions falls back on you, after the fact, one manual review pass per change — which is exactly the work you brought in an agent to avoid. When a session is producing thirty commits an hour, that review pass is the first thing to go.
The fix: let the agent look at the page
The answer is to make "look at the page" a tool call the agent can make itself. That's
what SnapDiff's MCP server
is for: it exposes visual regression testing to any MCP-capable agent — Claude Code,
Cursor, Windsurf, Cline, Zed, Continue — as a small set of tools. The one that matters
for drift is snapdiff_verify_ui_change, and the loop goes like this:
- Baseline. Your project stores an approved screenshot of each page — the way it's supposed to look.
- The agent makes its change and deploys it somewhere reachable (a preview deployment, a tunnel, a deployed Storybook).
- The agent verifies. It calls
snapdiff_verify_ui_changewith the page URL, a plain-language statement of what it intended to change, and the regions it expected to touch. - SnapDiff diffs against the baseline and checks whether the changed pixels fall inside the stated intent regions.
- The agent gets a verdict and a
next_action— and acts on it: proceed, request human review, or roll back and retry.
The part that matters is the intent. A raw pixel diff says "2.3% of the page changed" and leaves the interpretation to the agent — which will happily rationalize anything. An intent-aware verdict says "pixels changed in the top-right corner, but you told me you were editing the billing card." That's an unexpected regression, caught right away instead of in code review.
Setting up the MCP server
Get an API key from the SnapDiff dashboard (MCP access is included on paid plans, from $19/mo). Then register the server with your agent of choice.
Claude Code
claude mcp add snapdiff -e SNAPDIFF_API_KEY=sd_live_... -- npx -y @corralimited/snapdiff-mcp
Cursor
~/.cursor/mcp.json (or a project-level .cursor/mcp.json):
{
"mcpServers": {
"snapdiff": {
"command": "npx",
"args": ["-y", "@corralimited/snapdiff-mcp"],
"env": { "SNAPDIFF_API_KEY": "sd_live_..." }
}
}
}
Windsurf, Cline, Zed, Continue
Same shape — any client with a generic stdio MCP slot runs
npx -y @corralimited/snapdiff-mcp with SNAPDIFF_API_KEY in the
environment. Exact snippets for each editor are in the
MCP server README.
The verification loop in practice
Say the agent just added a cancel-subscription button to the billing card on your account page. Before calling the task done, it runs:
snapdiff_verify_ui_change({
project: "my-app",
page_name: "account-page",
after: "https://my-app-git-add-cancel-button.vercel.app/account",
intent: "added a Cancel subscription button to the bottom of the billing card",
intent_regions: [
{ bbox: [40, 380, 540, 260], label: "billing card + cancel button area" }
]
})
and gets back something it can actually act on:
{
"verdict": "expected_change_detected",
"next_action": "request_human_review",
"reasoning": "All changed regions fall inside the area you said you edited.",
"diff_percentage": 0.7,
"review_url": "https://snapdiff.ai/dashboard/verifications/vrf_...",
"annotated_regions": [{ "matches_intent": true, ... }]
}
The five verdicts cover the full space of outcomes:
| Verdict | Meaning | Agent's next move |
|---|---|---|
pass | Change is within tolerance of the baseline | Proceed |
expected_change_detected | Changed pixels match the stated intent | Request human review to promote the new baseline |
unexpected_regression | Pixels changed outside the intent regions | Roll back and retry — something broke that wasn't supposed to |
no_change_detected | Page is visually identical to baseline | If a change was expected, the deploy probably didn't take — verify the deployment |
needs_human_review | Intent can't be verified geometrically (e.g. no intent regions supplied) | Escalate to a human |
unexpected_regression is the verdict doing the real work here. The debug
ribbon the agent forgot in the header, the reflow from a careless refactor — both land
outside the intent bounding box, and the agent gets told to roll back and retry instead
of opening a PR.
Put it in the rules file
Agents do what their instructions say. Put the verification loop in your agent rules
file — CLAUDE.md for Claude Code, .cursor/rules for Cursor,
.windsurfrules for Windsurf — so it runs on every UI change, not just when
you remember to ask:
## UI changes
After any change that affects rendered UI, verify it visually:
1. Deploy the change to a preview URL.
2. Call snapdiff_verify_ui_change with the project, page_name, the
preview URL, your intent, and intent_regions covering the area
you meant to change.
3. Do NOT mark the task complete unless the verdict is `pass` or
`expected_change_detected`.
4. On `unexpected_regression`, follow the returned next_action:
fix the regression and re-verify before continuing.
Now verification is an exit criterion instead of something you have to remember to ask for. There's a nice side effect, too: an agent that knows it'll be graded on rendered output gets noticeably more careful about touching shared components in the first place.
Where you come back in
An agent shouldn't get to redefine "correct." When the verdict is
expected_change_detected, the response includes a review_url.
Open it and you see the agent's stated intent, the before/after/diff triptych, and each
changed region tagged as inside or outside the intent. One click approves — promoting
the new screenshot as the baseline for future verifications — or rejects, leaving the
baseline untouched.
And that's all the reviewing you do: one look per intentional change. The regressions mostly never reach you, because the agent rolled them back on its own.
Frequently asked questions
What is UI drift in AI-assisted coding?
UI drift is the accumulation of unintended visual changes that AI coding agents introduce as a side effect of the changes you asked for — shifted layouts, broken dark-mode styles, components that regressed because a shared file was edited. It happens because agents verify their work with compilers, type checkers, and unit tests, none of which render the page. Each regression is small; over many agent sessions the UI drifts visibly away from its intended design.
Which AI coding agents work with SnapDiff?
Any agent that speaks the Model Context Protocol: Claude Code, Cursor, Windsurf, Cline, Zed, and Continue all work. The MCP server runs locally over stdio via npx -y @corralimited/snapdiff-mcp, and the agent gets a snapdiff_verify_ui_change tool it can call after every UI edit.
Do I need an existing test suite or CI pipeline?
No. The verification loop runs entirely through the MCP tool and the SnapDiff API. You create a project, establish a baseline screenshot per page, and the agent diffs against that baseline. No Jest, no Playwright scripts, no CI config required — though a CI gate on preview URLs makes a good second layer.
How is verify_ui_change different from a plain screenshot diff?
A plain pixel diff tells you that something changed and by how much. snapdiff_verify_ui_change also knows what the agent intended: it checks whether changed pixels fall inside the regions the agent said it was editing, and returns a verdict plus a next_action instead of a raw percentage the agent has to interpret. For ad-hoc comparisons that don't need a verdict, the snapdiff_compare_pages tool does the raw diff.
Can the agent approve its own UI changes?
No. Detected expected changes route to request_human_review with a review URL. A human sees the before/after/diff and the agent's stated intent, then approves (promoting the new baseline) or rejects (baseline untouched). Baselines only move with human sign-off.
Does it work against localhost?
The hosted capture workers need a publicly reachable URL, so bare localhost won't work. Verify against a preview deployment (Vercel, Netlify, Cloudflare Pages), expose your dev server through a tunnel like cloudflared or ngrok, or verify component changes against a deployed Storybook.
How much does it cost to verify every agent change?
Plans are flat-rate: Free includes 200 diffs/month to evaluate the workflow, and MCP server access is included on paid plans starting at $19/month with 1,000 diffs. No per-screenshot billing, so a chatty agent doesn't produce a surprise invoice. See pricing.
Try it on your next agent session
One MCP server and a rules-file snippet. The free plan's 200 diffs a month
are plenty to find out whether your agent has been drifting.