What is UI drift in agentic coding?

UI drift is the buildup of unintended visual changes that AI coding agents introduce as side effects of the changes you actually asked for. Each one is small — a margin collapse here, an overflowing flex container there — and each one sails through review because the diff looks plausible as code. None of them is worth a revert on its own. But after a few dozen agent sessions the UI has ended up somewhere nobody chose, and there's no single commit to blame.

A few flavors you've probably met:

Why AI coding agents can't see what they break

An agent's feedback loop is text in, text out. It checks its work with whatever tools you give it — a compiler, a type checker, ESLint, a test runner — and all of those operate on source code. None of them render anything. A component can be type-safe, lint-clean, and fully covered by passing tests, and still look broken the moment someone loads the page.

Humans close this gap without thinking about it: save, alt-tab, squint, fix. Agents don't have the reflex, and usually don't have the eyes either. So catching visual regressions falls back on you, after the fact, one manual review pass per change — which is exactly the work you brought in an agent to avoid. When a session is producing thirty commits an hour, that review pass is the first thing to go.

The fix: let the agent look at the page

The answer is to make "look at the page" a tool call the agent can make itself. That's what SnapDiff's MCP server is for: it exposes visual regression testing to any MCP-capable agent — Claude Code, Cursor, Windsurf, Cline, Zed, Continue — as a small set of tools. The one that matters for drift is snapdiff_verify_ui_change, and the loop goes like this:

  1. Baseline. Your project stores an approved screenshot of each page — the way it's supposed to look.
  2. The agent makes its change and deploys it somewhere reachable (a preview deployment, a tunnel, a deployed Storybook).
  3. The agent verifies. It calls snapdiff_verify_ui_change with the page URL, a plain-language statement of what it intended to change, and the regions it expected to touch.
  4. SnapDiff diffs against the baseline and checks whether the changed pixels fall inside the stated intent regions.
  5. The agent gets a verdict and a next_action — and acts on it: proceed, request human review, or roll back and retry.

The part that matters is the intent. A raw pixel diff says "2.3% of the page changed" and leaves the interpretation to the agent — which will happily rationalize anything. An intent-aware verdict says "pixels changed in the top-right corner, but you told me you were editing the billing card." That's an unexpected regression, caught right away instead of in code review.

Setting up the MCP server

Get an API key from the SnapDiff dashboard (MCP access is included on paid plans, from $19/mo). Then register the server with your agent of choice.

Claude Code

claude mcp add snapdiff -e SNAPDIFF_API_KEY=sd_live_... -- npx -y @corralimited/snapdiff-mcp

Cursor

~/.cursor/mcp.json (or a project-level .cursor/mcp.json):

{
  "mcpServers": {
    "snapdiff": {
      "command": "npx",
      "args": ["-y", "@corralimited/snapdiff-mcp"],
      "env": { "SNAPDIFF_API_KEY": "sd_live_..." }
    }
  }
}

Windsurf, Cline, Zed, Continue

Same shape — any client with a generic stdio MCP slot runs npx -y @corralimited/snapdiff-mcp with SNAPDIFF_API_KEY in the environment. Exact snippets for each editor are in the MCP server README.

The verification loop in practice

Say the agent just added a cancel-subscription button to the billing card on your account page. Before calling the task done, it runs:

snapdiff_verify_ui_change({
  project: "my-app",
  page_name: "account-page",
  after: "https://my-app-git-add-cancel-button.vercel.app/account",
  intent: "added a Cancel subscription button to the bottom of the billing card",
  intent_regions: [
    { bbox: [40, 380, 540, 260], label: "billing card + cancel button area" }
  ]
})

and gets back something it can actually act on:

{
  "verdict": "expected_change_detected",
  "next_action": "request_human_review",
  "reasoning": "All changed regions fall inside the area you said you edited.",
  "diff_percentage": 0.7,
  "review_url": "https://snapdiff.ai/dashboard/verifications/vrf_...",
  "annotated_regions": [{ "matches_intent": true, ... }]
}

The five verdicts cover the full space of outcomes:

VerdictMeaningAgent's next move
passChange is within tolerance of the baselineProceed
expected_change_detectedChanged pixels match the stated intentRequest human review to promote the new baseline
unexpected_regressionPixels changed outside the intent regionsRoll back and retry — something broke that wasn't supposed to
no_change_detectedPage is visually identical to baselineIf a change was expected, the deploy probably didn't take — verify the deployment
needs_human_reviewIntent can't be verified geometrically (e.g. no intent regions supplied)Escalate to a human

unexpected_regression is the verdict doing the real work here. The debug ribbon the agent forgot in the header, the reflow from a careless refactor — both land outside the intent bounding box, and the agent gets told to roll back and retry instead of opening a PR.

Put it in the rules file

Agents do what their instructions say. Put the verification loop in your agent rules file — CLAUDE.md for Claude Code, .cursor/rules for Cursor, .windsurfrules for Windsurf — so it runs on every UI change, not just when you remember to ask:

## UI changes

After any change that affects rendered UI, verify it visually:

1. Deploy the change to a preview URL.
2. Call snapdiff_verify_ui_change with the project, page_name, the
   preview URL, your intent, and intent_regions covering the area
   you meant to change.
3. Do NOT mark the task complete unless the verdict is `pass` or
   `expected_change_detected`.
4. On `unexpected_regression`, follow the returned next_action:
   fix the regression and re-verify before continuing.

Now verification is an exit criterion instead of something you have to remember to ask for. There's a nice side effect, too: an agent that knows it'll be graded on rendered output gets noticeably more careful about touching shared components in the first place.

Where you come back in

An agent shouldn't get to redefine "correct." When the verdict is expected_change_detected, the response includes a review_url. Open it and you see the agent's stated intent, the before/after/diff triptych, and each changed region tagged as inside or outside the intent. One click approves — promoting the new screenshot as the baseline for future verifications — or rejects, leaving the baseline untouched.

And that's all the reviewing you do: one look per intentional change. The regressions mostly never reach you, because the agent rolled them back on its own.