Some notes on using AI agents (Codex, Claude Code, etc) alongside supporting tools (WebMCP, MCP, SKILLS.md, browser automation, etc) to automatically build, test, and debug browser userscripts.
Was randomly pondering about better stacks that give AI agents all the tools needed to inspect websites and automatically write, test, and debug browser userscripts.
At minimum, something like Codex + a browser debugging MCP should theoretically work, where the agent can:
- inspect the DOM and runtime state
- write or modify a userscript
- reload and test the page
- capture console errors / logs
- iterate until the script works
Possible starting point:
- Codex (or similar AI agent)
- Chrome DevTools MCP
- Chrome CLI
The idea would be to give the agent full browser inspection + control so it can iterate on userscripts automatically.
(Note to self: There may be some other ideas / architecture / etc in this initial chat I was exploring this in, though it will only be accessible to me: https://chatgpt.com/g/g-p-69ab7c686be48191817827ada5b67af3-sideproject-ideas/c/69abb755-5804-83ab-8d3a-c432a467ac68)
- https://dev.to/tumf/browser-code-teaching-ai-to-grow-userscripts-3npj
-
Browser Code: Teaching AI to Grow Userscripts
-
Browser Code operates as a browser extension, treating a page's DOM (Document Object Model) as a "virtual file system." It has Claude generate userscripts in JavaScript, then persists and auto-executes them via the chrome.userScripts API (a userscript persistence mechanism like Tampermonkey).
-
- https://github.com/chebykinn/browser-code
-
Browser Code
-
A coding agent for userscripts with its own loader.
Browser Code is a browser extension that gives Claude a virtual filesystem view of web pages. It generates, edits, and manages userscripts that persist to chrome.userScripts (the same API that Tampermonkey uses) and auto-run on matching URLs.
Think Claude Code, but for the DOM.
-
- https://webmachinelearning.github.io/webmcp/
-
WebMCP
-
WebMCP API is a new JavaScript interface that allows web developers to expose their web application functionality as “tools” - JavaScript functions with natural language descriptions and structured schemas that can be invoked by agents, browser’s agents, and assistive technologies. Web pages that use WebMCP can be thought of as Model Context Protocol (MCP) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.
- https://github.com/webmachinelearning/webmcp
-
WebMCP
-
Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows.
-
- https://github.com/jasonjmcghee/WebMCP
-
Early WebMCP proposal / implementation - since evolved and worked on by much more capable folks that develop the web: https://github.com/webmachinelearning/webmcp
-
The idea of WebMCP has since evolved and is being worked on by much more capable folks that develop the web.
-
This implementation is not compliant with the W3C spec.
- https://webmcp.dev
-
WebMCP Example
-
-
- https://developer.chrome.com/blog/webmcp-epp
-
WebMCP is available for early preview
-
As the agentic web evolves, we want to help websites play an active role in how AI agents interact with them. WebMCP aims to provide a standard way for exposing structured tools, ensuring AI agents can perform actions on your site with increased speed, reliability, and precision.
By defining these tools, you tell agents how and where to interact with your site, whether it's booking a flight, filing a support ticket, or navigating complex data. This direct communication channel eliminates ambiguity and allows for faster, more robust agent workflows.
-
- https://docs.mcp-b.ai
-
WebMCP + MCP = MCP-B
-
MCP-B combines the WebMCP page API with MCP-style transport and extensions in one browser runtime. Use WebMCP for page-level tool registration. Use MCP-B for resources, prompts, relay, React hooks, and browser tooling. You can start with WebMCP and add MCP-B later.
- https://github.com/WebMCP-org
-
MCP-B
-
Model Context Protocol for the Browser
-
MCP-B bridges the gap between WebMCP and the Model Context Protocol (MCP), serving two critical functions:
- API Implementation — Provides a polyfill that implements the
navigator.modelContextinterface for browsers lacking native support - Protocol Translation — Converts between WebMCP's web-native format and the MCP protocol, enabling cross-compatibility
MCP-B creates interoperability by enabling WebMCP-formatted tools to function with MCP clients (like Claude Desktop), and MCP-formatted tools to operate within WebMCP-enabled browsers. This allows both standards to evolve independently without breaking existing implementations.
- API Implementation — Provides a polyfill that implements the
-
-
- https://github.com/WebMCP-org/webmcp-userscripts
-
WebMCP Userscripts
-
TypeScript monorepo for building and testing Tampermonkey scripts which give websites WebMCP capabilities
-
WebMCP Userscripts is a TypeScript monorepo for building Tampermonkey userscripts that inject MCP-B (Model Context Protocol - Browser) servers into websites. This enables AI assistants to interact with web applications through structured tools rather than brittle DOM manipulation.
-
-
- https://github.com/prasmussen/chrome-cli
-
chrome-cli
-
chrome-cli is a command line utility for controlling Google Chrome compatible browsers on OS X. It is a native binary that uses the Scripting Bridge to communicate with Chrome.
-
- https://developer.chrome.com/blog/chrome-devtools-mcp
-
Chrome DevTools (MCP) for your AI agent
- https://github.com/ChromeDevTools/chrome-devtools-mcp
-
Chrome DevTools MCP
-
chrome-devtools-mcplets your coding agent (such as Gemini, Claude, Cursor or Copilot) control and inspect a live Chrome browser. It acts as a Model-Context-Protocol (MCP) server, giving your AI coding assistant access to the full power of Chrome DevTools for reliable automation, in-depth debugging, and performance analysis.
-
-
- https://github.com/microsoft/playwright-mcp
-
Playwright MCP
-
A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright. This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.
-
Playwright MCP vs Playwright CLI
This package provides MCP interface into Playwright. If you are using a coding agent, you might benefit from using the CLI+SKILLS instead.
- CLI: Modern coding agents increasingly favor CLI–based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient: they avoid loading large tool schemas and verbose accessibility trees into the model context, allowing agents to act through concise, purpose-built commands. This makes CLI + SKILLs better suited for high-throughput coding agents that must balance browser automation with large codebases, tests, and reasoning within limited context windows.
Learn more about Playwright CLI with SKILLS. - MCP: MCP remains relevant for specialized agentic loops that benefit from persistent state, rich introspection, and iterative reasoning over page structure, such as exploratory automation, self-healing tests, or long-running autonomous workflows where maintaining continuous browser context outweighs token cost concerns.
- CLI: Modern coding agents increasingly favor CLI–based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient: they avoid loading large tool schemas and verbose accessibility trees into the model context, allowing agents to act through concise, purpose-built commands. This makes CLI + SKILLs better suited for high-throughput coding agents that must balance browser automation with large codebases, tests, and reasoning within limited context windows.
- https://github.com/microsoft/playwright-cli
-
playwright-cli
-
Playwright CLI with SKILLS
-
-
- https://www.stagehand.dev
-
Stagehand
-
The AI Browser Automation Framework
-
We built an OSS alternative to Playwright that's easier to use and lets AI reliably read and write on the web.
- https://github.com/browserbase/stagehand
-
Stagehand
-
The AI Browser Automation Framework
-
Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.
-
Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.
-
-
- https://browser-use.com
-
Browser Use
-
Agents at scale. Undetectable browsers. Purpose-built models. The API for any website.
- https://github.com/browser-use/browser-use
-
Browser Use
-
The AI browser agent
-
Make websites accessible for AI agents. Automate tasks online with ease.
-
-
- https://www.browserwing.com
-
BrowserWing
-
Modern Browser Automation
-
MCP & Skill Ready
-
The bridge between AI and the Web. Instant setup, full control, limitless customization.
- https://github.com/browserwing/browserwing
-
BrowserWing
-
Native Browser Automation Platform with AI Integration
-
BrowserWing turns your browser actions into MCP commands Or Claude Skill, allowing AI agents to control browsers efficiently and reliably. Say goodbye to slow, token-heavy LLM interactions — let agents call commands directly for faster automation. Perfect for AI-driven tasks, browser automation, and boosting productivity.
-
-
- https://www.browserable.ai
-
Browserable
-
Browser automation library for AI agents (JS)
-
Build browser agents that can navigate sites, fill out forms, and extract information.
- https://github.com/browserable/browserable
-
Browserable
-
Open source browser automation library for AI agents
-
Browserable allows you to build browser agents that can navigate sites, fill out forms, clicking buttons and extract information. It is currently at 90.4% on the Web Voyager benchmarks.
-
-
- https://github.com/0xdevalias
- https://gist.github.com/0xdevalias
- https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
- Notes on API/userscript to improve Twitter 'Notifications Timeline' (0xdevalias' gist)
- Deobfuscating / Unminifying Obfuscated Web App Code (0xdevalias' gist)
- Reverse Engineering Webpack Apps (0xdevalias' gist)
- React Server Components, Next.js v13+, and Webpack: Notes on Streaming Wire Format (
__next_f, etc) (0xdevalias' gist)) - Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)
- Bypassing Cloudflare, Akamai, etc (0xdevalias' gist)
- Debugging Electron Apps (and related memory issues) (0xdevalias' gist)
- devalias' Beeper CSS Hacks (0xdevalias' gist)