Skip to content

Instantly share code, notes, and snippets.

@0xdevalias
Last active March 8, 2026 01:27
Show Gist options
  • Select an option

  • Save 0xdevalias/32e76ac5e647fd9c341a871742aabffa to your computer and use it in GitHub Desktop.

Select an option

Save 0xdevalias/32e76ac5e647fd9c341a871742aabffa to your computer and use it in GitHub Desktop.
Some notes on using AI agents (Codex, Claude Code, etc) alongside supporting tools (MCP, SKILLS.md, browser automation, etc) to automatically build, test, and debug browser userscripts.

AI Agent Toolkit for Automated Browser Userscript Development

Some notes on using AI agents (Codex, Claude Code, etc) alongside supporting tools (WebMCP, MCP, SKILLS.md, browser automation, etc) to automatically build, test, and debug browser userscripts.

Table of Contents

Idea / Context

Was randomly pondering about better stacks that give AI agents all the tools needed to inspect websites and automatically write, test, and debug browser userscripts.

At minimum, something like Codex + a browser debugging MCP should theoretically work, where the agent can:

  • inspect the DOM and runtime state
  • write or modify a userscript
  • reload and test the page
  • capture console errors / logs
  • iterate until the script works

Potential Stack

Possible starting point:

The idea would be to give the agent full browser inspection + control so it can iterate on userscripts automatically.

(Note to self: There may be some other ideas / architecture / etc in this initial chat I was exploring this in, though it will only be accessible to me: https://chatgpt.com/g/g-p-69ab7c686be48191817827ada5b67af3-sideproject-ideas/c/69abb755-5804-83ab-8d3a-c432a467ac68)

Related Projects

Browser Code

  • https://dev.to/tumf/browser-code-teaching-ai-to-grow-userscripts-3npj
    • Browser Code: Teaching AI to Grow Userscripts

    • Browser Code operates as a browser extension, treating a page's DOM (Document Object Model) as a "virtual file system." It has Claude generate userscripts in JavaScript, then persists and auto-executes them via the chrome.userScripts API (a userscript persistence mechanism like Tampermonkey).

  • https://github.com/chebykinn/browser-code
    • Browser Code

    • A coding agent for userscripts with its own loader.

      Browser Code is a browser extension that gives Claude a virtual filesystem view of web pages. It generates, edits, and manages userscripts that persist to chrome.userScripts (the same API that Tampermonkey uses) and auto-run on matching URLs.

      Think Claude Code, but for the DOM.

Browser Automation / Agent Tools

  • https://webmachinelearning.github.io/webmcp/
    • WebMCP

    • WebMCP API is a new JavaScript interface that allows web developers to expose their web application functionality as “tools” - JavaScript functions with natural language descriptions and structured schemas that can be invoked by agents, browser’s agents, and assistive technologies. Web pages that use WebMCP can be thought of as Model Context Protocol (MCP) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.

    • https://github.com/webmachinelearning/webmcp
      • WebMCP

      • Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows.

    • https://github.com/jasonjmcghee/WebMCP
      • Early WebMCP proposal / implementation - since evolved and worked on by much more capable folks that develop the web: https://github.com/webmachinelearning/webmcp

      • The idea of WebMCP has since evolved and is being worked on by much more capable folks that develop the web.

      • This implementation is not compliant with the W3C spec.

      • https://webmcp.dev
        • WebMCP Example

    • https://developer.chrome.com/blog/webmcp-epp
      • WebMCP is available for early preview

      • As the agentic web evolves, we want to help websites play an active role in how AI agents interact with them. WebMCP aims to provide a standard way for exposing structured tools, ensuring AI agents can perform actions on your site with increased speed, reliability, and precision.

        By defining these tools, you tell agents how and where to interact with your site, whether it's booking a flight, filing a support ticket, or navigating complex data. This direct communication channel eliminates ambiguity and allows for faster, more robust agent workflows.

    • https://docs.mcp-b.ai
      • WebMCP + MCP = MCP-B

      • MCP-B combines the WebMCP page API with MCP-style transport and extensions in one browser runtime. Use WebMCP for page-level tool registration. Use MCP-B for resources, prompts, relay, React hooks, and browser tooling. You can start with WebMCP and add MCP-B later.

      • https://github.com/WebMCP-org
        • MCP-B

        • Model Context Protocol for the Browser

        • MCP-B bridges the gap between WebMCP and the Model Context Protocol (MCP), serving two critical functions:

          1. API Implementation — Provides a polyfill that implements the navigator.modelContext interface for browsers lacking native support
          2. Protocol Translation — Converts between WebMCP's web-native format and the MCP protocol, enabling cross-compatibility

          MCP-B creates interoperability by enabling WebMCP-formatted tools to function with MCP clients (like Claude Desktop), and MCP-formatted tools to operate within WebMCP-enabled browsers. This allows both standards to evolve independently without breaking existing implementations.

    • https://github.com/WebMCP-org/webmcp-userscripts
      • WebMCP Userscripts

      • TypeScript monorepo for building and testing Tampermonkey scripts which give websites WebMCP capabilities

      • WebMCP Userscripts is a TypeScript monorepo for building Tampermonkey userscripts that inject MCP-B (Model Context Protocol - Browser) servers into websites. This enables AI assistants to interact with web applications through structured tools rather than brittle DOM manipulation.

  • https://github.com/prasmussen/chrome-cli
    • chrome-cli

    • chrome-cli is a command line utility for controlling Google Chrome compatible browsers on OS X. It is a native binary that uses the Scripting Bridge to communicate with Chrome.

  • https://developer.chrome.com/blog/chrome-devtools-mcp
    • Chrome DevTools (MCP) for your AI agent

    • https://github.com/ChromeDevTools/chrome-devtools-mcp
      • Chrome DevTools MCP

      • chrome-devtools-mcp lets your coding agent (such as Gemini, Claude, Cursor or Copilot) control and inspect a live Chrome browser. It acts as a Model-Context-Protocol (MCP) server, giving your AI coding assistant access to the full power of Chrome DevTools for reliable automation, in-depth debugging, and performance analysis.

  • https://github.com/microsoft/playwright-mcp
    • Playwright MCP

    • A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright. This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.

    • Playwright MCP vs Playwright CLI

      This package provides MCP interface into Playwright. If you are using a coding agent, you might benefit from using the CLI+SKILLS instead.

      • CLI: Modern coding agents increasingly favor CLI–based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient: they avoid loading large tool schemas and verbose accessibility trees into the model context, allowing agents to act through concise, purpose-built commands. This makes CLI + SKILLs better suited for high-throughput coding agents that must balance browser automation with large codebases, tests, and reasoning within limited context windows.
        Learn more about Playwright CLI with SKILLS.
      • MCP: MCP remains relevant for specialized agentic loops that benefit from persistent state, rich introspection, and iterative reasoning over page structure, such as exploratory automation, self-healing tests, or long-running autonomous workflows where maintaining continuous browser context outweighs token cost concerns.
    • https://github.com/microsoft/playwright-cli
      • playwright-cli

      • Playwright CLI with SKILLS

  • https://www.stagehand.dev
    • Stagehand

    • The AI Browser Automation Framework

    • We built an OSS alternative to Playwright that's easier to use and lets AI reliably read and write on the web.

    • https://github.com/browserbase/stagehand
      • Stagehand

      • The AI Browser Automation Framework

      • Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.

      • Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.

  • https://browser-use.com
    • Browser Use

    • Agents at scale. Undetectable browsers. Purpose-built models. The API for any website.

    • https://github.com/browser-use/browser-use
      • Browser Use

      • The AI browser agent

      • Make websites accessible for AI agents. Automate tasks online with ease.

  • https://www.browserwing.com
    • BrowserWing

    • Modern Browser Automation

    • MCP & Skill Ready

    • The bridge between AI and the Web. Instant setup, full control, limitless customization.

    • https://github.com/browserwing/browserwing
      • BrowserWing

      • Native Browser Automation Platform with AI Integration

      • BrowserWing turns your browser actions into MCP commands Or Claude Skill, allowing AI agents to control browsers efficiently and reliably. Say goodbye to slow, token-heavy LLM interactions — let agents call commands directly for faster automation. Perfect for AI-driven tasks, browser automation, and boosting productivity.

  • https://www.browserable.ai
    • Browserable

    • Browser automation library for AI agents (JS)

    • Build browser agents that can navigate sites, fill out forms, and extract information.

    • https://github.com/browserable/browserable
      • Browserable

      • Open source browser automation library for AI agents

      • Browserable allows you to build browser agents that can navigate sites, fill out forms, clicking buttons and extract information. It is currently at 90.4% on the Web Voyager benchmarks.

See Also

My Other Related Deepdive Gist's and Projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment