Which AI coding agents can use SnapDiff for visual verification?

Any agent that speaks the Model Context Protocol (MCP): Claude Code, Cursor, Windsurf, Cline, Zed, and Continue all work. The SnapDiff MCP server runs locally over stdio via npx @corralimited/snapdiff-mcp, or you can point clients at the hosted endpoint. The agent gets a snapdiff_verify_ui_change tool it can call after every UI edit.

Do I need an existing test suite or CI pipeline to catch agent UI regressions?

No. The verification loop runs entirely through the MCP tool and the SnapDiff API. You create a project, establish a baseline screenshot of each page, and the agent diffs its changes against that baseline. No Jest, no Playwright scripts, no CI configuration required — though you can add a CI gate later for defense in depth.

Can the AI agent approve its own UI changes?

No. When a verification detects an expected change, the next_action is request_human_review and the response includes a review URL. A human opens the review page, sees the before/after/diff triptych and the agent's stated intent, and approves or rejects. Approving promotes the new screenshot as the baseline; rejecting leaves the baseline untouched. Baselines only move with human sign-off.

How much does it cost to run visual checks on every agent change?

SnapDiff plans are flat-rate, and MCP server access is included on every plan — including the Free plan's 200 diffs per month, no credit card required. Paid plans start at $19/month (1,000 diffs). There is no per-screenshot or per-seat billing, so a chatty agent doesn't produce a surprise invoice.

Home / Blog / Catching UI Drift From AI Coding Agents

Catching UI Drift From AI Coding Agents Before It Ships

Q: How is verify_ui_change different from a plain screenshot diff?

A plain pixel diff tells you that something changed and by how much. snapdiff_verify_ui_change also knows what the agent intended to change: the agent states its intent and the regions it expected to touch, and the server checks whether the changed pixels actually fall inside those regions. The result is a verdict — pass, expected_change_detected, unexpected_regression, no_change_detected, or needs_human_review — plus a next_action the agent can act on, instead of a raw percentage it has to interpret.

Q: Does visual verification work against localhost?

The hosted SnapDiff capture workers need a URL they can reach from the public internet, so a bare http://localhost:3000 won't work. The common patterns are: verify against a preview deployment (Vercel, Netlify, Cloudflare Pages), expose your dev server through a tunnel like cloudflared or ngrok, or verify against a deployed Storybook for component-level changes.

June 9, 2026 · 9 min read · AI Coding Agents

Ask Claude Code for a cancel button and you'll get a cancel button. You might also get a billing card that's 40px taller, a footer pushed below the fold, and a dark-mode style that quietly stopped applying. The tests pass, the types check out — the agent just never looked at the page. Developers have started calling this UI drift, and if you lean on agents for frontend work, you've almost certainly shipped some.

What is UI drift in agentic coding?

UI drift is the buildup of unintended visual changes that AI coding agents introduce as side effects of the changes you actually asked for. Each one is small — a margin collapse here, an overflowing flex container there — and each one sails through review because the diff looks plausible as code. None of them is worth a revert on its own. But after a few dozen agent sessions the UI has ended up somewhere nobody chose, and there's no single commit to blame.

A few flavors you've probably met:

Shared-component blast radius. The agent edits Card.tsx to fix the page you asked about and quietly reflows the other eleven pages that use it.
CSS cascade side effects. A new utility class or a reordered import changes specificity, and an unrelated selector starts winning.
Dark mode and responsive states. The agent checks its work against the markup it wrote, not against the page at 375 px wide in dark mode.
Refactors that "shouldn't change anything." Extracting a layout component is exactly the kind of change agents do confidently — and exactly the kind that shifts every descendant by a few pixels.

Why AI coding agents can't see what they break

An agent's feedback loop is text in, text out. It checks its work with whatever tools you give it — a compiler, a type checker, ESLint, a test runner — and all of those operate on source code. None of them render anything. A component can be type-safe, lint-clean, and fully covered by passing tests, and still look broken the moment someone loads the page.

Humans close this gap without thinking about it: save, alt-tab, squint, fix. Agents don't have the reflex, and usually don't have the eyes either. So catching visual regressions falls back on you, after the fact, one manual review pass per change — which is exactly the work you brought in an agent to avoid. When a session is producing thirty commits an hour, that review pass is the first thing to go.

The fix: let the agent look at the page

The answer is to make "look at the page" a tool call the agent can make itself. That's what SnapDiff's MCP server is for: it exposes visual regression testing to any MCP-capable agent — Claude Code, Cursor, Windsurf, Cline, Zed, Continue — as a small set of tools. The one that matters for drift is snapdiff_verify_ui_change, and the loop goes like this:

Baseline. Your project stores an approved screenshot of each page — the way it's supposed to look.
The agent makes its change and deploys it somewhere reachable (a preview deployment, a tunnel, a deployed Storybook).
The agent verifies. It calls snapdiff_verify_ui_change with the page URL, a plain-language statement of what it intended to change, and the regions it expected to touch.
SnapDiff diffs against the baseline and checks whether the changed pixels fall inside the stated intent regions.
The agent gets a verdict and a next_action — and acts on it: proceed, request human review, or roll back and retry.

The part that matters is the intent. A raw pixel diff says "2.3% of the page changed" and leaves the interpretation to the agent — which will happily rationalize anything. An intent-aware verdict says "pixels changed in the top-right corner, but you told me you were editing the billing card." That's an unexpected regression, caught right away instead of in code review.

Setting up the MCP server

Get an API key from the SnapDiff dashboard — MCP access is included on every plan, and the free plan needs no credit card. Then register the server with your agent of choice.

Claude Code

claude mcp add snapdiff -e SNAPDIFF_API_KEY=sd_live_... -- npx -y @corralimited/snapdiff-mcp

Cursor

~/.cursor/mcp.json (or a project-level .cursor/mcp.json):

{
  "mcpServers": {
    "snapdiff": {
      "command": "npx",
      "args": ["-y", "@corralimited/snapdiff-mcp"],
      "env": { "SNAPDIFF_API_KEY": "sd_live_..." }
    }
  }
}

Windsurf, Cline, Zed, Continue

Same shape — any client with a generic stdio MCP slot runs npx -y @corralimited/snapdiff-mcp with SNAPDIFF_API_KEY in the environment. Exact snippets for each editor are in the MCP server README.

The verification loop in practice

Say the agent just added a cancel-subscription button to the billing card on your account page. Before calling the task done, it runs:

snapdiff_verify_ui_change({
  project: "my-app",
  page_name: "account-page",
  after: "https://my-app-git-add-cancel-button.vercel.app/account",
  intent: "added a Cancel subscription button to the bottom of the billing card",
  intent_regions: [
    { bbox: [40, 380, 540, 260], label: "billing card + cancel button area" }
  ]
})

and gets back something it can actually act on:

{
  "verdict": "expected_change_detected",
  "next_action": "request_human_review",
  "reasoning": "All changed regions fall inside the area you said you edited.",
  "diff_percentage": 0.7,
  "review_url": "https://snapdiff.ai/dashboard/verifications/vrf_...",
  "annotated_regions": [{ "matches_intent": true, ... }]
}

The five verdicts cover the full space of outcomes:

Verdict	Meaning	Agent's next move
`pass`	Change is within tolerance of the baseline	Proceed
`expected_change_detected`	Changed pixels match the stated intent	Request human review to promote the new baseline
`unexpected_regression`	Pixels changed outside the intent regions	Roll back and retry — something broke that wasn't supposed to
`no_change_detected`	Page is visually identical to baseline	If a change was expected, the deploy probably didn't take — verify the deployment
`needs_human_review`	Intent can't be verified geometrically (e.g. no intent regions supplied)	Escalate to a human

unexpected_regression is the verdict doing the real work here. The debug ribbon the agent forgot in the header, the reflow from a careless refactor — both land outside the intent bounding box, and the agent gets told to roll back and retry instead of opening a PR.

Put it in the rules file

Agents do what their instructions say. Put the verification loop in your agent rules file — CLAUDE.md for Claude Code, .cursor/rules for Cursor, .windsurfrules for Windsurf — so it runs on every UI change, not just when you remember to ask:

## UI changes

After any change that affects rendered UI, verify it visually:

1. Deploy the change to a preview URL.
2. Call snapdiff_verify_ui_change with the project, page_name, the
   preview URL, your intent, and intent_regions covering the area
   you meant to change.
3. Do NOT mark the task complete unless the verdict is `pass` or
   `expected_change_detected`.
4. On `unexpected_regression`, follow the returned next_action:
   fix the regression and re-verify before continuing.

Now verification is an exit criterion instead of something you have to remember to ask for. There's a nice side effect, too: an agent that knows it'll be graded on rendered output gets noticeably more careful about touching shared components in the first place.

Where you come back in

An agent shouldn't get to redefine "correct." When the verdict is expected_change_detected, the response includes a review_url. Open it and you see the agent's stated intent, the before/after/diff triptych, and each changed region tagged as inside or outside the intent. One click approves — promoting the new screenshot as the baseline for future verifications — or rejects, leaving the baseline untouched.

And that's all the reviewing you do: one look per intentional change. The regressions mostly never reach you, because the agent rolled them back on its own.

Frequently asked questions

What is UI drift in AI-assisted coding?

UI drift is the accumulation of unintended visual changes that AI coding agents introduce as a side effect of the changes you asked for — shifted layouts, broken dark-mode styles, components that regressed because a shared file was edited. It happens because agents verify their work with compilers, type checkers, and unit tests, none of which render the page. Each regression is small; over many agent sessions the UI drifts visibly away from its intended design.

Which AI coding agents work with SnapDiff?

Any agent that speaks the Model Context Protocol: Claude Code, Cursor, Windsurf, Cline, Zed, and Continue all work. The MCP server runs locally over stdio via npx -y @corralimited/snapdiff-mcp, and the agent gets a snapdiff_verify_ui_change tool it can call after every UI edit.

Do I need an existing test suite or CI pipeline?

No. The verification loop runs entirely through the MCP tool and the SnapDiff API. You create a project, establish a baseline screenshot per page, and the agent diffs against that baseline. No Jest, no Playwright scripts, no CI config required — though a CI gate on preview URLs makes a good second layer.

How is `verify_ui_change` different from a plain screenshot diff?

A plain pixel diff tells you that something changed and by how much. snapdiff_verify_ui_change also knows what the agent intended: it checks whether changed pixels fall inside the regions the agent said it was editing, and returns a verdict plus a next_action instead of a raw percentage the agent has to interpret. For ad-hoc comparisons that don't need a verdict, the snapdiff_compare_pages tool does the raw diff.

Can the agent approve its own UI changes?

No. Detected expected changes route to request_human_review with a review URL. A human sees the before/after/diff and the agent's stated intent, then approves (promoting the new baseline) or rejects (baseline untouched). Baselines only move with human sign-off.

Does it work against localhost?

The hosted capture workers need a publicly reachable URL, so bare localhost won't work. Verify against a preview deployment (Vercel, Netlify, Cloudflare Pages), expose your dev server through a tunnel like cloudflared or ngrok, or verify component changes against a deployed Storybook.

How much does it cost to verify every agent change?

Plans are flat-rate and MCP server access is included on every plan: Free includes 200 diffs/month (no credit card) to evaluate the workflow, and paid plans start at $19/month with 1,000 diffs. No per-screenshot billing, so a chatty agent doesn't produce a surprise invoice. See pricing.

Try it on your next agent session

One MCP server and a rules-file snippet. The free plan's 200 diffs a month
are plenty to find out whether your agent has been drifting.

Get a Free API Key → Read the Docs