Skip to content

Instantly share code, notes, and snippets.

@serkan-ozal
Last active January 14, 2026 17:57
Show Gist options
  • Select an option

  • Save serkan-ozal/45bf70343ba276065c973c1ab7f9e28e to your computer and use it in GitHub Desktop.

Select an option

Save serkan-ozal/45bf70343ba276065c973c1ab7f9e28e to your computer and use it in GitHub Desktop.
Browser DevTools MCP Blog Post

Empowering AI to Test and Debug Its Own Code: Introducing Browser DevTools MCP

How AI agents can autonomously test, debug, and validate web applications without human intervention


The Problem: AI Can Write Code, But Can It Test It?

We've reached a remarkable milestone in AI development: AI assistants can now write complex code, refactor entire codebases, and implement sophisticated features. But here's the catch—when AI generates code, especially for web applications, how does it verify that the code actually works? How does it debug issues? How does it ensure the UI matches the design?

Traditionally, this has required a human developer to:

  • Manually test the generated code
  • Debug runtime errors
  • Verify visual correctness
  • Check network requests and console logs
  • Validate accessibility and performance

This creates a bottleneck. The AI writes code, but then it needs a human to validate it. What if the AI could autonomously test and debug its own work?

The Solution: Browser DevTools MCP

Browser DevTools MCP is a powerful Model Context Protocol (MCP) server that gives AI assistants comprehensive browser automation and debugging capabilities. It enables AI agents to:

  • Autonomously test web applications they've built
  • Debug issues by inspecting console logs, network requests, and runtime state
  • Validate visual correctness by comparing against designs
  • Monitor performance and accessibility
  • Interact with production systems using real user credentials

The key insight: AI doesn't need to just write code—it needs to observe and interact with the running application, just like a human developer would.

Why This Matters: Autonomous AI Development

With Browser DevTools MCP, AI can operate in a complete development loop:

  1. Write code → AI generates the implementation
  2. Test code → AI navigates to the app, interacts with it, and validates behavior
  3. Debug issues → AI inspects errors, network failures, and visual problems
  4. Fix and iterate → AI makes corrections based on what it observed
  5. Repeat → The cycle continues autonomously

This isn't just for local development. Because the MCP server can run anywhere (local machine, CI/CD, or production servers), AI can test against:

  • Local development environments
  • Staging environments
  • Production environments (with proper authentication)

The AI becomes both the developer and the tester, working autonomously without constant human oversight.


Comprehensive Tool Suite

Browser DevTools MCP provides over 30 specialized tools organized into logical categories. Let me give you a quick overview of all available tools, then dive deep into the most powerful ones.

Quick Reference: All Tools

Content & Visual Inspection:

  • content_take-screenshot - Capture screenshots (full page or elements)
  • content_get-as-html - Extract HTML with filtering options
  • content_get-as-text - Extract visible text content
  • content_save-as-pdf - Export pages as PDF documents

Browser Interaction:

  • interaction_click - Click elements by CSS selector
  • interaction_fill - Fill form inputs
  • interaction_hover - Hover over elements
  • interaction_press-key - Simulate keyboard input
  • interaction_select - Select dropdown options
  • interaction_drag - Drag and drop operations
  • interaction_scroll - Scroll viewport or containers (multiple modes)
  • interaction_resize-viewport - Resize viewport using emulation
  • interaction_resize-window - Resize real browser window (OS-level)

Navigation:

  • navigation_go-to - Navigate to URLs with configurable wait strategies
  • navigation_go-back - Navigate backward in history
  • navigation_go-forward - Navigate forward in history

Synchronization:

  • sync_wait-for-network-idle - Wait for network activity to settle

Accessibility:

  • a11y_take-aria-snapshot - Capture semantic structure and accessibility roles
  • a11y_take-ax-tree-snapshot - Combine accessibility tree with visual diagnostics

Observability:

  • o11y_get-console-messages - Capture and filter console logs
  • o11y_get-http-requests - Monitor network traffic with detailed filtering
  • o11y_get-web-vitals - Collect Core Web Vitals (LCP, INP, CLS, TTFB, FCP)
  • monitoring_get-trace-id - Get current OpenTelemetry trace ID
  • monitoring_set-trace-id - Set custom trace ID
  • monitoring_new-trace-id - Generate new trace ID

Network Stubbing:

  • stub_intercept-http-request - Intercept and modify outgoing requests
  • stub_mock-http-response - Mock HTTP responses with configurable behavior
  • stub_list - List all installed stubs
  • stub_clear - Remove stubs

React Component Inspection:

  • react_get-component-for-element - Find React component for a DOM element
  • react_get-element-for-component - Find DOM elements rendered by a component

JavaScript Execution:

  • run_js-in-browser - Execute JavaScript in browser page context
  • run_js-in-sandbox - Execute JavaScript in Node.js VM sandbox

Design Comparison:

  • compare-page-with-design - Compare live page UI against Figma designs

Now, let's dive deep into the tools that make this truly powerful.


Deep Dive: Essential Tools for AI Testing & Debugging

Visual Debugging: Screenshots and Accessibility

content_take-screenshot

Screenshots are the AI's "eyes" into the application. This tool captures visual state at any moment, allowing AI to:

  • Verify UI correctness - "Does the button look right?"
  • Debug layout issues - "Why is this element overlapping?"
  • Document visual bugs - Capture evidence of problems

Key Features:

  • Capture full page or specific elements via CSS selector
  • Automatic image optimization (scales to fit Claude's vision API limits)
  • PNG or JPEG format with quality control
  • Smart compression that converts PNGs to JPEGs for smaller file sizes

Real-world use case: AI generates a login form, takes a screenshot, and validates that all fields are visible and properly styled.

a11y_take-aria-snapshot

Accessibility isn't just about compliance—it's about understanding the semantic structure of a page. ARIA snapshots reveal:

  • Component hierarchy - What roles do elements have?
  • Accessibility labels - How do screen readers see this?
  • Interactive states - What's clickable, focusable, disabled?

Why this matters for AI: When AI generates UI code, it needs to verify that the semantic structure matches the visual appearance. An element might look like a button, but if it's not marked as role="button", it's broken for assistive technologies.

Example output:

- role: button
  name: "Submit Form"
  state: { disabled: false, focusable: true }
- role: textbox
  name: "Email Address"
  state: { required: true, invalid: false }

a11y_take-ax-tree-snapshot

This is the "superpower" of accessibility debugging. It combines Chromium's accessibility tree with runtime visual diagnostics:

  • Bounding boxes - Exact pixel positions
  • Visibility state - Is it actually visible?
  • Occlusion detection - Is something covering it?
  • Computed styles - What CSS is actually applied?

The killer feature: Occlusion detection. When AI clicks a button and nothing happens, occlusion detection reveals that another invisible element is covering it. This is incredibly difficult to debug without this tool.

Example scenario:

AI: "I clicked the submit button but nothing happened."
Tool: "The button is covered by an invisible overlay div (opacity: 0, but still blocking clicks)."
AI: "Ah, I need to fix the z-index."

Design Validation: Figma Comparison

compare-page-with-design

One of the most powerful features: AI can compare the live application against the original Figma design and get a similarity score.

How it works:

  1. Fetches the design snapshot from Figma API
  2. Takes a screenshot of the live page
  3. Computes multiple similarity signals:
    • MSSIM (structural similarity) - Pixel-level comparison
    • Image embedding similarity - Semantic understanding
    • Text embedding similarity - Content-aware comparison
  4. Returns a combined score (0-1) with detailed notes

Why this is revolutionary:

  • AI can autonomously validate that implementation matches design
  • Works with real data (not just mockups) using "semantic" mode
  • Identifies specific regions that don't match
  • No human needed to manually compare screenshots

Use cases:

  • "Does this page match the Figma design?" → Score: 0.92 ✅
  • "The header doesn't match" → Score: 0.65, notes indicate header region mismatch
  • Automated design regression testing

Execution-Level Debugging

o11y_get-console-messages

Console logs are the application's "voice"—they tell you what's happening (or what went wrong). This tool captures:

  • JavaScript errors - Syntax errors, runtime exceptions
  • Warnings - Deprecation notices, performance issues
  • Logs - Debug information, user actions
  • Advanced filtering - By level, search text, timestamp, sequence number

AI debugging workflow:

1. AI generates code
2. AI navigates to the page
3. AI checks console messages
4. Finds: "Uncaught TypeError: Cannot read property 'x' of undefined"
5. AI fixes the code: "I need to add a null check before accessing 'x'"

o11y_get-http-requests

Network requests reveal the data flow of the application. This tool captures:

  • Request/response details - Headers, body, status codes
  • Timing information - When requests started, how long they took
  • Resource types - XHR, fetch, document, stylesheet, etc.
  • Advanced filtering - By URL pattern, status code, resource type, timing

Why this matters:

  • AI can verify that API calls are being made correctly
  • AI can detect failed requests (404, 500, timeouts)
  • AI can understand the data flow: "User clicks button → API call → Response updates UI"

Example debugging:

AI: "The user list isn't loading."
Tool: "Found 1 failed request: GET /api/users → 500 Internal Server Error"
AI: "The backend endpoint is broken. I need to check the API implementation."

o11y_get-web-vitals

Performance isn't just about speed—it's about user experience. This tool collects Core Web Vitals:

  • LCP (Largest Contentful Paint) - How fast the main content loads
  • INP (Interaction to Next Paint) - How responsive the page feels
  • CLS (Cumulative Layout Shift) - Visual stability
  • TTFB (Time to First Byte) - Server response time
  • FCP (First Contentful Paint) - Initial render time

AI can now:

  • Identify performance bottlenecks
  • Get actionable recommendations based on Google's thresholds
  • Validate that optimizations actually improved performance

Example:

Tool: "LCP: 3.2s (needs improvement), INP: 150ms (good), CLS: 0.05 (good)"
AI: "The LCP is slow. I should optimize the hero image loading."

React Component Inspection

react_get-component-for-element

When debugging React applications, AI needs to understand the component structure. This tool answers: "What React component rendered this DOM element?"

How it works:

  • Takes a DOM element (via selector or x,y coordinates)
  • Traverses React Fiber tree to find the component
  • Returns component name, props preview, and full component stack

Example:

AI: "What component is this button?"
Tool: "Button component, props: { label: 'Submit', onClick: [Function], disabled: false }"
      Component stack: App → Form → ButtonGroup → Button

react_get-element-for-component

The reverse operation: "What DOM elements does this React component render?"

Use cases:

  • AI generates a component, wants to verify it renders correctly
  • AI needs to find all elements belonging to a specific component
  • AI wants to understand the component's "DOM footprint"

Example:

AI: "Find all elements rendered by the UserCard component"
Tool: Returns 5 DOM elements: avatar image, name text, email text, edit button, delete button

Important note: These tools work best with persistent browser context and React DevTools extension installed, but they can also work in "best-effort" mode by scanning DOM for React Fiber pointers.

JavaScript Execution

run_js-in-browser

Sometimes AI needs to execute custom JavaScript directly in the page context. This tool provides:

  • Full DOM access - document.querySelector, window, etc.
  • Web APIs - Fetch, localStorage, sessionStorage, etc.
  • React/Vue/Angular access - If frameworks expose globals
  • Custom debugging logic - Extract data, mutate state, trigger events

Example use cases:

  • "Extract all user data from localStorage"
  • "Trigger a custom event to test event handlers"
  • "Read a value from a React component's state (if exposed)"
  • "Simulate complex user interactions that aren't covered by basic tools"

run_js-in-sandbox

For server-side automation logic, this tool executes JavaScript in a Node.js VM sandbox with:

  • Playwright Page access - Full control over the browser
  • Safe built-ins - Limited Node.js APIs (no file system, no network)
  • Console logging - Debug output from sandbox code

Use cases:

  • Complex automation workflows
  • Data extraction and processing
  • Custom synchronization logic

Network Stubbing & Mocking

stub_intercept-http-request

AI can modify outgoing HTTP requests before they're sent. This enables:

  • A/B testing - Inject different headers for different user segments
  • Security testing - Inject malformed headers or payloads
  • Feature flags - Modify requests based on feature flags
  • Auth simulation - Add API keys or tokens automatically

Example:

AI: "Intercept all requests to /api/* and add X-API-Key header"
Tool: "Stub installed. All matching requests will have the header added."

stub_mock-http-response

Even more powerful: AI can mock HTTP responses entirely. This enables:

  • Offline testing - Return cached data when APIs are down
  • Error scenario testing - Simulate 500 errors, timeouts, network failures
  • Edge case testing - Return empty data, huge payloads, special characters
  • Flaky API testing - Use probability to randomly fail requests

Example scenarios:

  • "Mock the /api/users endpoint to return 500 error (test error handling)"
  • "Mock /api/data to return empty array (test empty state UI)"
  • "Mock /api/upload with 50% failure rate (test retry logic)"

Advanced features:

  • Configurable delay (simulate slow networks)
  • Times limit (apply stub only N times, then let through)
  • Probability (flaky testing: apply stub with X% chance)

Highlights: Game-Changing Features

🔥 OpenTelemetry Integration: End-to-End Tracing

This is one of the most powerful features for production debugging. Browser DevTools MCP automatically injects OpenTelemetry Web SDK into every page it navigates to.

What this enables:

  1. Automatic UI Trace Collection

    • Document load events
    • Fetch/XHR requests (automatically traced)
    • User interactions (clicks, form submissions, etc.)
    • All captured as OpenTelemetry spans
  2. Trace Context Propagation

    • Trace IDs automatically propagated in HTTP headers (traceparent)
    • Frontend traces automatically correlated with backend traces
    • End-to-end visibility across the entire application stack
  3. Distributed Tracing

    • See the full request flow: User click → Frontend action → API call → Backend processing → Database query → Response
    • All in one unified trace view
    • Identify bottlenecks across frontend and backend

Real-world scenario:

User reports: "The checkout process is slow"
AI uses OpenTelemetry to trace the flow:
- Frontend: Button click (50ms)
- Frontend: API call to /api/checkout (starts)
- Backend: Payment processing (2.5 seconds) ← BOTTLENECK
- Backend: Database update (100ms)
- Frontend: UI update (20ms)

AI identifies: "The payment processing is slow. I should optimize the payment gateway integration."

Why this matters for AI:

  • AI can debug production issues without needing backend access
  • AI can understand the full application flow, not just frontend
  • AI can identify performance issues across the entire stack
  • AI can correlate frontend errors with backend failures

🔥 Persistent Browser Context: Real-World Automation

This feature enables something truly powerful: AI can interact with production SaaS applications using your real credentials.

How it works:

  • Enable persistent browser context (BROWSER_PERSISTENT_ENABLE=true)
  • Browser profile persists across sessions (cookies, localStorage, extensions)
  • Login to your accounts once, then AI can use them

What AI can do:

Google Workspace:

  • Create Google Docs, Sheets, Slides
  • Send emails via Gmail
  • Check spam folder
  • Manage Google Drive files
  • Schedule Calendar events

Other SaaS Examples:

  • Notion - Create pages, update databases, manage workspaces
  • Slack - Send messages, create channels, manage workflows
  • GitHub - Create issues, review PRs, manage repositories
  • Linear - Create tasks, update status, manage projects
  • Figma - Create designs, update components, manage files
  • Stripe - View payments, manage subscriptions, generate reports
  • Salesforce - Update records, create leads, manage accounts

The power: AI isn't just testing code—it's using your applications. It can:

  • Automate repetitive tasks
  • Generate reports from multiple sources
  • Cross-reference data across platforms
  • Perform complex workflows that span multiple services

Example workflow:

AI: "Create a Google Doc summarizing all open GitHub issues"
1. AI logs into GitHub (using persistent credentials)
2. AI fetches all open issues
3. AI logs into Google Docs (using persistent credentials)
4. AI creates a new document
5. AI writes a summary of all issues
6. AI shares the document with the team

Security note: This requires careful consideration. The AI has access to your authenticated sessions. Use with trusted AI systems and proper access controls.


Use Cases: Where This Shines

1. Autonomous Code Testing

Scenario: AI generates a new feature (e.g., user registration form)

AI workflow:

  1. Writes the code
  2. Navigates to the page
  3. Takes a screenshot → Validates visual appearance
  4. Fills the form → Tests user interaction
  5. Submits the form → Checks for errors
  6. Inspects console messages → Verifies no JavaScript errors
  7. Checks HTTP requests → Validates API calls
  8. Compares with Figma design → Ensures design match
  9. Checks Web Vitals → Validates performance
  10. Reports: "Feature implemented and tested. All checks passed."

No human intervention needed.

2. Production Debugging

Scenario: User reports a bug in production

AI workflow:

  1. Enables OpenTelemetry tracing
  2. Navigates to the problematic page
  3. Reproduces the issue
  4. Inspects console messages → Finds JavaScript error
  5. Checks HTTP requests → Identifies failed API call
  6. Uses OpenTelemetry traces → Correlates with backend logs
  7. Identifies root cause: "Backend API is returning 500 error for this specific request"
  8. Fixes the code
  9. Re-tests in production
  10. Reports: "Bug fixed. Root cause: Backend validation error for edge case input."

3. Design Validation

Scenario: AI implements a new UI component

AI workflow:

  1. Implements the component
  2. Navigates to the page
  3. Uses compare-page-with-design → Gets similarity score
  4. If score is low, takes screenshot and analyzes differences
  5. Adjusts CSS/styling
  6. Re-compares
  7. Iterates until score is acceptable
  8. Reports: "Component matches design (similarity: 0.94)"

4. Performance Optimization

Scenario: AI wants to optimize page load time

AI workflow:

  1. Measures current Web Vitals
  2. Identifies bottlenecks (e.g., slow LCP)
  3. Implements optimizations (lazy loading, image optimization, etc.)
  4. Re-measures Web Vitals
  5. Validates improvement
  6. Reports: "LCP improved from 3.2s to 1.8s (44% improvement)"

5. Accessibility Auditing

Scenario: AI wants to ensure the app is accessible

AI workflow:

  1. Takes ARIA snapshot → Checks semantic structure
  2. Takes AX tree snapshot → Checks visual accessibility
  3. Identifies issues (missing labels, incorrect roles, etc.)
  4. Fixes the code
  5. Re-audits
  6. Reports: "Accessibility issues fixed. All interactive elements now have proper ARIA labels."

6. Cross-Platform Testing

Scenario: AI wants to test responsive design

AI workflow:

  1. Tests desktop viewport (1920x1080)
  2. Takes screenshot
  3. Resizes to tablet (768x1024)
  4. Takes screenshot
  5. Resizes to mobile (375x667)
  6. Takes screenshot
  7. Validates that UI adapts correctly
  8. Reports: "Responsive design verified across all breakpoints"

Getting Started

Browser DevTools MCP is available as an npm package and can be used with any MCP-compatible AI assistant (Claude, Cursor, VS Code, Windsurf, etc.).

Quick start:

{
  "mcpServers": {
    "browser-devtools": {
      "command": "npx",
      "args": ["-y", "browser-devtools-mcp"]
    }
  }
}

Configuration:

  • Enable persistent context: BROWSER_PERSISTENT_ENABLE=true
  • Enable OpenTelemetry: OTEL_ENABLE=true
  • Configure browser mode: BROWSER_HEADLESS_ENABLE=false (for visual debugging)

Documentation: GitHub Repository


Conclusion: The Future of Autonomous AI Development

Browser DevTools MCP represents a fundamental shift in how AI interacts with web applications. It's not just about writing code—it's about understanding and validating code in the same way a human developer would.

Key takeaways:

  1. AI can now autonomously test and debug - No human needed to verify AI-generated code
  2. Production-ready - Works in local, staging, and production environments
  3. Comprehensive tooling - Visual debugging, execution debugging, performance monitoring, accessibility auditing
  4. Real-world automation - Can interact with production SaaS applications using persistent credentials
  5. End-to-end visibility - OpenTelemetry integration provides full-stack debugging capabilities

The vision: AI that can:

  • Write code
  • Test code
  • Debug code
  • Optimize code
  • Validate design
  • Monitor performance
  • Ensure accessibility

All autonomously, in a continuous loop, without human intervention.

This isn't just a tool—it's a new paradigm for AI-assisted development. The AI becomes a complete development team: developer, tester, debugger, and QA engineer, all in one.

The question isn't "Can AI write code?"
The question is "Can AI verify that its code works?"

With Browser DevTools MCP, the answer is: Yes, absolutely.


Browser DevTools MCP is open source and available on GitHub. Contributions and feedback are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment