0xdevalias/ai-agent-browser-userscript-toolkit.md

## ai-agent-browser-userscript-toolkit.md

      
    Raw
  

              ai-agent-browser-userscript-toolkit.md
            
          
    AI Agent Toolkit for Automated Browser Userscript Development

Some notes on using AI agents (Codex, Claude Code, etc) alongside supporting tools (WebMCP, MCP, SKILLS.md, browser automation, etc) to automatically build, test, and debug browser userscripts.
Table of Contents


Idea / Context

Potential Stack


Related Projects

Browser Code


Browser Automation / Agent Tools
See Also

My Other Related Deepdive Gist's and Projects


Idea / Context

Was randomly pondering about better stacks that give AI agents all the tools needed to inspect websites and automatically write, test, and debug browser userscripts.
At minimum, something like Codex + a browser debugging MCP should theoretically work, where the agent can:

inspect the DOM and runtime state
write or modify a userscript
reload and test the page
capture console errors / logs
iterate until the script works

Potential Stack

Possible starting point:

Codex (or similar AI agent)
Chrome DevTools MCP

https://developer.chrome.com/blog/chrome-devtools-mcp
https://github.com/ChromeDevTools/chrome-devtools-mcp


Chrome CLI

https://github.com/prasmussen/chrome-cli


The idea would be to give the agent full browser inspection + control so it can iterate on userscripts automatically.
(Note to self: There may be some other ideas / architecture / etc in this initial chat I was exploring this in, though it will only be accessible to me: https://chatgpt.com/g/g-p-69ab7c686be48191817827ada5b67af3-sideproject-ideas/c/69abb755-5804-83ab-8d3a-c432a467ac68)
Related Projects

Browser Code


https://dev.to/tumf/browser-code-teaching-ai-to-grow-userscripts-3npj


Browser Code: Teaching AI to Grow Userscripts


Browser Code operates as a browser extension, treating a page's DOM (Document Object Model) as a "virtual file system." It has Claude generate userscripts in JavaScript, then persists and auto-executes them via the chrome.userScripts API (a userscript persistence mechanism like Tampermonkey).


https://github.com/chebykinn/browser-code


Browser Code


A coding agent for userscripts with its own loader.
Browser Code is a browser extension that gives Claude a virtual filesystem view of web pages. It generates, edits, and manages userscripts that persist to chrome.userScripts (the same API that Tampermonkey uses) and auto-run on matching URLs.
Think Claude Code, but for the DOM.


Browser Automation / Agent Tools


https://webmachinelearning.github.io/webmcp/


WebMCP


WebMCP API is a new JavaScript interface that allows web developers to expose their web application functionality as “tools” - JavaScript functions with natural language descriptions and structured schemas that can be invoked by agents, browser’s agents, and assistive technologies. Web pages that use WebMCP can be thought of as Model Context Protocol (MCP) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.


https://github.com/webmachinelearning/webmcp


WebMCP


Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows.


https://github.com/jasonjmcghee/WebMCP


Early WebMCP proposal / implementation - since evolved and worked on by much more capable folks that develop the web: https://github.com/webmachinelearning/webmcp


The idea of WebMCP has since evolved and is being worked on by much more capable folks that develop the web.


This implementation is not compliant with the W3C spec.


https://webmcp.dev


WebMCP Example


https://developer.chrome.com/blog/webmcp-epp


WebMCP is available for early preview


As the agentic web evolves, we want to help websites play an active role in how AI agents interact with them. WebMCP aims to provide a standard way for exposing structured tools, ensuring AI agents can perform actions on your site with increased speed, reliability, and precision.
By defining these tools, you tell agents how and where to interact with your site, whether it's booking a flight, filing a support ticket, or navigating complex data. This direct communication channel eliminates ambiguity and allows for faster, more robust agent workflows.


https://docs.mcp-b.ai


WebMCP + MCP = MCP-B


MCP-B combines the WebMCP page API with MCP-style transport and extensions in one browser runtime. Use WebMCP for page-level tool registration. Use MCP-B for resources, prompts, relay, React hooks, and browser tooling. You can start with WebMCP and add MCP-B later.


https://github.com/WebMCP-org


MCP-B


Model Context Protocol for the Browser


MCP-B bridges the gap between WebMCP and the Model Context Protocol (MCP), serving two critical functions:

API Implementation — Provides a polyfill that implements the navigator.modelContext interface for browsers lacking native support
Protocol Translation — Converts between WebMCP's web-native format and the MCP protocol, enabling cross-compatibility

MCP-B creates interoperability by enabling WebMCP-formatted tools to function with MCP clients (like Claude Desktop), and MCP-formatted tools to operate within WebMCP-enabled browsers. This allows both standards to evolve independently without breaking existing implementations.


https://github.com/WebMCP-org/webmcp-userscripts


WebMCP Userscripts


TypeScript monorepo for building and testing Tampermonkey scripts which give websites WebMCP capabilities


WebMCP Userscripts is a TypeScript monorepo for building Tampermonkey userscripts that inject MCP-B (Model Context Protocol - Browser) servers into websites. This enables AI assistants to interact with web applications through structured tools rather than brittle DOM manipulation.


https://github.com/prasmussen/chrome-cli


chrome-cli


chrome-cli is a command line utility for controlling Google Chrome compatible browsers on OS X. It is a native binary that uses the Scripting Bridge to communicate with Chrome.


https://developer.chrome.com/blog/chrome-devtools-mcp


Chrome DevTools (MCP) for your AI agent


https://github.com/ChromeDevTools/chrome-devtools-mcp


Chrome DevTools MCP


chrome-devtools-mcp lets your coding agent (such as Gemini, Claude, Cursor or Copilot) control and inspect a live Chrome browser. It acts as a Model-Context-Protocol (MCP) server, giving your AI coding assistant access to the full power of Chrome DevTools for reliable automation, in-depth debugging, and performance analysis.


https://github.com/microsoft/playwright-mcp


Playwright MCP


A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright. This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.


Playwright MCP vs Playwright CLI
This package provides MCP interface into Playwright. If you are using a coding agent, you might benefit from using the CLI+SKILLS instead.

CLI: Modern coding agents increasingly favor CLI–based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient: they avoid loading large tool schemas and verbose accessibility trees into the model context, allowing agents to act through concise, purpose-built commands. This makes CLI + SKILLs better suited for high-throughput coding agents that must balance browser automation with large codebases, tests, and reasoning within limited context windows.

Learn more about Playwright CLI with SKILLS.
MCP: MCP remains relevant for specialized agentic loops that benefit from persistent state, rich introspection, and iterative reasoning over page structure, such as exploratory automation, self-healing tests, or long-running autonomous workflows where maintaining continuous browser context outweighs token cost concerns.


https://github.com/microsoft/playwright-cli


playwright-cli


Playwright CLI with SKILLS


https://www.stagehand.dev


Stagehand


The AI Browser Automation Framework


We built an OSS alternative to Playwright that's easier to use and lets AI reliably read and write on the web.


https://github.com/browserbase/stagehand


Stagehand


The AI Browser Automation Framework


Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.


Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.


https://browser-use.com


Browser Use


Agents at scale. Undetectable browsers. Purpose-built models. The API for any website.


https://github.com/browser-use/browser-use


Browser Use


The AI browser agent


Make websites accessible for AI agents. Automate tasks online with ease.


https://www.browserwing.com


BrowserWing


Modern Browser Automation


MCP & Skill Ready


The bridge between AI and the Web. Instant setup, full control, limitless customization.


https://github.com/browserwing/browserwing


BrowserWing


Native Browser Automation Platform with AI Integration


BrowserWing turns your browser actions into MCP commands Or Claude Skill, allowing AI agents to control browsers efficiently and reliably. Say goodbye to slow, token-heavy LLM interactions — let agents call commands directly for faster automation. Perfect for AI-driven tasks, browser automation, and boosting productivity.


https://www.browserable.ai


Browserable


Browser automation library for AI agents (JS)


Build browser agents that can navigate sites, fill out forms, and extract information.


https://github.com/browserable/browserable


Browserable


Open source browser automation library for AI agents


Browserable allows you to build browser agents that can navigate sites, fill out forms, clicking buttons and extract information. It is currently at 90.4% on the Web Voyager benchmarks.


See Also

My Other Related Deepdive Gist's and Projects


https://github.com/0xdevalias
https://gist.github.com/0xdevalias
https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.

Reverse engineering ChatGPT's frontend web app + deep dive explorations of the code (0xdevalias' gist)


Notes on API/userscript to improve Twitter 'Notifications Timeline' (0xdevalias' gist)
Deobfuscating / Unminifying Obfuscated Web App Code (0xdevalias' gist)
Reverse Engineering Webpack Apps (0xdevalias' gist)

React Internals (subsection)
Vue Internals (subsection)
Angular Internals (subsection)


React Server Components, Next.js v13+, and Webpack: Notes on Streaming Wire Format (__next_f, etc) (0xdevalias' gist))
Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)

JavaScript Web App Reverse Engineering - Module Identification (0xdevalias' gist)
Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias' gist)


Bypassing Cloudflare, Akamai, etc (0xdevalias' gist)
Debugging Electron Apps (and related memory issues) (0xdevalias' gist)
devalias' Beeper CSS Hacks (0xdevalias' gist)
No results found