Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save gwpl/02bcacb9a11ebd6c61bb7fd40f553bc3 to your computer and use it in GitHub Desktop.

Select an option

Save gwpl/02bcacb9a11ebd6c61bb7fd40f553bc3 to your computer and use it in GitHub Desktop.
IndyDevDan: 4-Layer Claude Code Architecture — Skills → Agents → Commands → Reusability (Playwright Browser Automation & UI Testing) | Channel: https://www.youtube.com/@indydevdan

Claude Code: Orchestration, Agents, Skills & Hooks — Quick Reference

Companion notes to IndyDevDan's 4-Layer Architecture video. Maps his concepts to Claude Code's built-in primitives.

Also more in CLIAI/agentic-4layer-architecture repo


1. Orchestration → Custom Commands (Reusable Prompts)

Custom commands are markdown files in .claude/commands/ that act as reusable, parameterized prompts invoked with /command-name in Claude Code.

What they are:

  • Markdown files (.md) stored in .claude/commands/ (project-scoped) or ~/.claude/commands/ (global)
  • Invoked via /command-name inside a Claude Code session
  • Can accept arguments via $ARGUMENTS placeholder
  • Can reference files with @file-path syntax

IndyDevDan's "Orchestration Layer" maps directly to this — his UI review prompt, Amazon automation prompt, and higher-order prompts are all custom commands.

Example structure:

.claude/commands/
├── ui-review.md          # Orchestrates parallel QA agents
├── automate.md           # Higher-order prompt: wraps a workflow
└── deploy-check.md       # Pre-deploy validation

Example command (.claude/commands/ui-review.md):

# UI Review

## Purpose
Run parallel user story validation against the application UI.

## Variables
- stories_dir: `./ai-review/stories/`
- output_dir: `./ai-review/output/`

## Workflow
1. Discover all `.story.md` files in stories_dir
2. For each story, spawn a sub-agent with @browser-qa agent
3. Each agent: parse steps → navigate → screenshot → report pass/fail
4. Collect results, generate summary report

Key insight from the video: Commands are the orchestration layer — they compose skills and agents into repeatable workflows. Think of them as the API layer for your agentic system.


2. Custom Agents for Workflows → Using .claude/agents/

Agent definitions live in .claude/agents/ as markdown files with YAML front matter. They define specialized sub-agents that can be referenced with @agent-name in prompts or spawned programmatically.

What they are:

  • Markdown files in .claude/agents/ (project) or ~/.claude/agents/ (global)
  • YAML front matter can specify: allowed_tools, model, temperature
  • Referenced via @agent-name in commands or conversation
  • Can be spawned as sub-agents for parallel execution

IndyDevDan's agent layer — his Playwright Browser Agent, Claude Browser Agent, and Browser QA Agent are all custom agents that scale skills into specialized workflows.

Example structure:

.claude/agents/
├── browser-qa.md          # UI validation via user stories
├── playwright-browser.md  # Generic Playwright browser tasks
└── claude-browser.md      # Chrome-flag browser automation

Example agent (.claude/agents/browser-qa.md):

---
allowed_tools:
  - Bash
  - Read
  - Write
  - Glob
---

# Browser QA Agent

## Purpose
UI validation agent that works through user stories using Playwright CLI.

## Variables
- session_name: unique per story
- screenshots_dir: `./output/{story_name}/screenshots/`

## Workflow
1. Parse user story into discrete steps
2. Create output directory for this story
3. For each step:
   - Execute via `playwright-cli` (headless)
   - Take screenshot: `playwright-cli screenshot --session {session_name}`
   - Validate expected outcome
4. Report PASS/FAIL with screenshot evidence
5. Close browser session

## Output Format
Story: {name}
URL: {url}
Result: PASS | FAIL
Steps: {completed}/{total}
Evidence: {screenshots_dir}

Key insight: Agents give you scale. The skill is raw capability; the agent is how you parallelize and specialize it. With Claude Code's team orchestration, you can spawn multiple agents working toward a common goal.


3. Skills to Encode SOPs → .claude/skills/

Skills are Claude Code's mechanism for encoding Standard Operating Procedures. They can include both instructions (markdown) AND executable scripts.

What they are:

  • Directory-based: .claude/skills/{skill-name}/ containing a SKILL.md and optional scripts
  • SKILL.md defines the capability, usage patterns, defaults
  • Scripts (bash, python, etc.) can live alongside the skill markdown
  • Activated in agent front matter or referenced in commands

IndyDevDan's "Capability Layer" — his Playwright Browser and Claude Browser skills are foundational capabilities that agents and commands build upon.

Example structure:

.claude/skills/
├── playwright-browser/
│   ├── SKILL.md              # Instructions + defaults
│   ├── setup.sh              # Install/configure Playwright
│   └── helpers/
│       └── take-screenshot.sh
├── claude-browser/
│   └── SKILL.md
└── data-pipeline/
    ├── SKILL.md
    └── validate.py           # Data validation script

Example skill (.claude/skills/playwright-browser/SKILL.md):

# Playwright Browser Skill

Token-efficient CLI for browser automation.
Runs headless, supports parallel sessions, persistent profiles.

## Setup
Run `./setup.sh` to install Playwright and browsers.

## Defaults
- Headless: true (override with `--headed` for debugging)
- Timeout: 30s per action
- Screenshots: saved to `./screenshots/` by default

## Core Commands
- `playwright-cli navigate --url {url} --session {name}`
- `playwright-cli click --selector {sel} --session {name}`
- `playwright-cli type --selector {sel} --text {text} --session {name}`
- `playwright-cli screenshot --session {name} --output {path}`
- `playwright-cli close --session {name}`

## Best Practices
- Always use named sessions for state isolation
- Take screenshots after each significant action
- Use `--wait-for` to handle dynamic content
- Prefer CSS selectors over XPath for reliability

Scripts bundled with skills — this is a key differentiator. Skills aren't just prompts; they can include executable code:

#!/bin/bash
# .claude/skills/playwright-browser/setup.sh
# Ensures Playwright is installed and browsers are available

if ! command -v playwright &> /dev/null; then
    npm install -g playwright
    npx playwright install chromium
    echo "Playwright installed successfully"
else
    echo "Playwright already available"
fi

Key insight: Skills are the foundational layer. They encode domain knowledge, tool-specific patterns, and opinionated defaults. Build skills for capabilities; build agents and commands on top of them.


4. Claude Code Hooks

Hooks are shell commands that execute automatically in response to Claude Code lifecycle events. They enable guardrails, automation, and integration without modifying your prompts or skills.

What they are:

  • Configured in .claude/settings.json (project) or ~/.claude/settings.json (global)
  • Fire on specific events in the Claude Code lifecycle
  • Can modify behavior, enforce policies, or trigger side effects
  • Run synchronously — Claude Code waits for them to complete

Hook events:

Event When it fires Use case
PreToolUse Before a tool executes Validate/block dangerous operations
PostToolUse After a tool completes Log actions, post-process results
Notification On notifications Route alerts to Slack, desktop, etc.
Stop When agent stops Cleanup, final reporting

Example configuration (.claude/settings.json):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "command": "python3 .claude/hooks/validate-bash.py"
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Write",
        "command": "bash .claude/hooks/post-write-lint.sh"
      }
    ],
    "Notification": [
      {
        "command": "bash .claude/hooks/notify-desktop.sh"
      }
    ],
    "Stop": [
      {
        "command": "bash .claude/hooks/cleanup-screenshots.sh"
      }
    ]
  }
}

Example hook — block dangerous commands:

#!/usr/bin/env python3
# .claude/hooks/validate-bash.py
# Reads tool input from stdin, blocks dangerous patterns

import json, sys

data = json.load(sys.stdin)
command = data.get("tool_input", {}).get("command", "")

blocked = ["rm -rf /", "DROP TABLE", "format c:", "--no-verify"]
for pattern in blocked:
    if pattern in command:
        print(json.dumps({
            "decision": "block",
            "reason": f"Blocked dangerous pattern: {pattern}"
        }))
        sys.exit(0)

# Allow by default
print(json.dumps({"decision": "allow"}))

Example hook — auto-lint after file writes:

#!/bin/bash
# .claude/hooks/post-write-lint.sh
# Auto-format files after Claude writes them

FILE=$(cat - | jq -r '.tool_input.file_path // empty')
if [[ "$FILE" == *.py ]]; then
    ruff format "$FILE" 2>/dev/null
elif [[ "$FILE" == *.ts ]] || [[ "$FILE" == *.tsx ]]; then
    npx prettier --write "$FILE" 2>/dev/null
fi

Key insight: Hooks are the guardrails and glue layer. They don't appear in IndyDevDan's 4-layer stack directly, but they complement it — ensuring your agentic workflows stay safe, consistent, and integrated with your dev environment.


Putting It All Together: The Stack

┌─────────────────────────────────────────────┐
│  Layer 4: Just Files / Task Runners         │  ← Reusability
│  (just, make, scripts — entry points)       │
├─────────────────────────────────────────────┤
│  Layer 3: Custom Commands                   │  ← Orchestration
│  (.claude/commands/*.md — reusable prompts) │
├─────────────────────────────────────────────┤
│  Layer 2: Custom Agents                     │  ← Scale
│  (.claude/agents/*.md — specialized bots)   │
├─────────────────────────────────────────────┤
│  Layer 1: Skills                            │  ← Capability
│  (.claude/skills/*/SKILL.md + scripts)      │
├─────────────────────────────────────────────┤
│  Foundation: Hooks                          │  ← Guardrails & Glue
│  (.claude/settings.json — lifecycle events) │
└─────────────────────────────────────────────┘

Each layer builds on the one below it. Skills provide raw capabilities. Agents scale and specialize those capabilities. Commands orchestrate agents into repeatable workflows. Task runners (just, make) provide the top-level entry points. Hooks run throughout, enforcing safety and consistency.


Based on IndyDevDan's 4-Layer Claude Code architecture. Video: My 4-Layer Claude Code Playwright CLI Skill

title author channel_url upload_date url duration transcript_source may_contain_errors description
My 4-Layer Claude Code Playwright CLI Skill (Agentic Browser Automation)
IndyDevDan
2025-05-19
2344
auto-generated
true
IndyDevDan breaks down his 4-layer architecture for building agentic browser automation and UI testing with Claude Code: Skills (capabilities), Agents (scale), Commands (orchestration), and Just files (reusability). Demonstrates Playwright CLI integration, parallel QA agents, and browser automation workflows.

What's up engineers? Indie Devdan here. With the right skills, your agent can do tons of work for you. You know this, but by engineering the right stack of skills, sub agents, prompts, and reusability system, you can automate entire classes of work. If you don't have agents for these two classes of work we'll break down in this video, you're wasting time doing work your agents can do for you. This is the name of the game right now. How many problems can you hand off to your agents in a reusable scalable way? In the terminal J automate Amazon, we're kicking off a Claude Code instance in fast mode. This is going to run a different type of workflow. This is going to run an agentic browser automation workflow. Now, this is a personal workflow that it's running. You can see here it's got a whole list of items that I need to purchase and it's going to do this for me. This is browser automation. As engineers, there's a more important task that we need to focus on as well. So, we'll open up a new terminal here, and we'll type JUI review. This is going to kick off agentic UI testing. You can see here we're kicking off three browsers. This is going to be a mock user test on top of Hacker News. And our UI tests are effectively going to operate a user story against Hacker News. You can have agents do your UI testing. Now, there are several benefits to this over traditional UI testing with Jest or Vitest that we're going to cover in this video. But you can see here, you know, 40k tokens each they're completing. They're summarizing back to the primary agent. And you can see here these user stories have passed. Why is agentic browser use so important? It's because it allows you to copy yourself into the digital world so you can automate two key classes of engineering work. Browser automation and UI testing. Whenever I sit down to automate a problem with agents, I always ask myself, what is the right combination of skills, sub agents, prompts, and tools I can use to solve this problem in a templated way for repeat success? In this video, I want to share my four layer approach for building agents that automate and test work on your behalf. Let's break down automating work on the web.

Bowser: 4-Layer Architecture

Bowser is an opinionated structure using skills, sub agents, commands, and one additional layer we'll talk about. And the whole point here is to set up systems for agentic browser automation and UI testing. I don't just want to solve this problem for one codebase. I want a system that I can come to that's built in an agent-first way to solve browser automation and UI testing.

Core Technologies

So let's start with the core technology. The Amazon workflow is of course using Claude with Chrome. You can activate this by using the --chrome flag and it's a great way to use your existing browser session to accomplish work with agents. There are pros and cons to this approach which is why I needed another tool so that we could scale UI testing with agents. That is of course the Playwright CLI. Now this is super important and the developers know this. You want to be using CLIs, not MCP servers. MCP servers chew up your tokens and they're very rigid. You have to do it their way however the MCP server is built. This is why we always prefer CLIs and CLIs give us the massive benefit that we can build on top of it in our own opinionated way. So these are the two technologies that Bowser is built on.

A couple key things to note here. We ran three agents that QA specific user stories and they all responded to the primary agent with success. You can see each of them has their own number of steps and they all have their own screenshots. This is super critical. You can see the autocomplete already picking up on what I want to do. I'm just going to hit tab enter, open the screenshots directory and let's go ahead and see what happened there. You can see that every step of the way our agents created a screenshot of the workflow — view top post comments. We can walk through exactly what our agent did and how it validated everything. This is a simple mock example test, but imagine this running on your brand new user interface that you're building up with agents deploying very quickly. If one of your workflows goes wrong, your agents now have a trail of success and a trail of failure because they're taking screenshots along the way.

Layer 1: Skills (Capabilities)

I want to show you different ways you can layer your architecture. It's not just about skills. Everyone's very obsessed with skills. I want to show you how you can stack it up and layer it properly to get repeatable results at scale.

So, let's jump into the codebase here. We'll start with the skill. We have two key skills: Claude Browser and Playwright Browser. Let's start with the more interesting one, Playwright Browser. If I open this up, you can see we have this structure. This is a token-efficient CLI for Playwright. Runs headless, supports parallel sessions, and we have named sessions for stored state.

You can see we have just a bunch of details on how this works. And this is directly using the Playwright CLI. The nice part about building your own skill is you get to customize it however you want. They have their own opinionated skill in here. I always recommend you check out how other engineers are building their skill. There's reference files here and then they have a skill.mmd kind of just breaking down what the help command would do. I'm breaking it down my own way. I'm setting up defaults that I want for repeated success for the way that I'm going to be building applications.

This is an important thing to mention. Code is fully commoditized. Anyone can generate code. That is not an advantage anymore. What is an advantage is your specific solution to the problem you're solving. And that boils down all the way to how you write your skills. So you can see here the big advantage we get out of this is headless by default. We get parallel sessions and we get persistent profiles. So if you are running some type of login workflow with your Playwright testing agent you can persist the session which is really important.

We also have the Claude Browser skill. There's not much to document here because when you're using Claude with the --chrome flag essentially what it does is it injects a bunch of additional tools that allows Claude to access the browser. And so the only real checks we need to add into the skill is just to make sure that the flag is turned on.

Layer 2: Agents (Scale)

The skill is the capability. This is the foundational layer. The next piece of this layered approach is going to be our agents. We have three agents. Let's start with a simple one — the Playwright Browser Agent. Now check out how I'm prompt engineering this. This is a very simple sub agent that we can spin up to do arbitrary browser work with the Playwright CLI. So we've activated the skill in the front matter and then we're mentioning it just one more time inside of the actual workflow. So you can see here this agent is very simple. All we're doing is scaling this skill into a sub agent. We can prompt over and over for UI testing tasks and really just for any browser automation task.

And then we have the Claude Browser Agent. We can use the Claude Code Chrome tools inside of a sub agent. The big problem with this is that you cannot run this in parallel. So this is one of the big limitations.

Let's move to our most important agent, the Browser QA Agent. Here's where things get interesting. This is where we're actually building out a concrete workflow. This is where things get more specialized. This is a UI validation agent that's going to work through user stories. It's going to do it in a very specific way. This is where we start templating or engineering into a system for repeat success. These agents can do a lot more than we give them credit for. It's time to start pushing them hard into specific workflows to automate classes of work.

We have a classic agent workflow here. Classic agentic prompt: purpose, variables, workflow, report, examples. This agent is going to parse a user story into specific steps, create a directory for it, work through the workflow, take screenshots, report pass or fail, and then actually close the browser. We have an opinionated workflow with a few variables where our agent is going to be recording its journey along the way and saving screenshots. Very very powerful.

We layered an agent on top of a skill and then we built an agent to use that skill. This Bowser codebase isn't like a standalone codebase — it's a codebase that you can reference to pull a consistent structure of skills, sub agents, prompts, and one additional layer for reusability. It allows you to take this and apply to any problem, any code base with a consistent structure.

The agent is where we start to specialize and scale where the skill is just our raw capability. If you're just looking at things from the angle of a skill, you're not using all agentics as well as you could. There are many other pieces that you can add to this to really expand what you can do with your agent. Everyone is just spamming skills right now. And that's great. I understand why, but there's layers to how you can build this up for repeat success.

Especially with these new agent orchestration features coming out of Claude Code and other agent coding tools, knowing how to build these specialized agents that you can scale is going to be ultra important. I think sub agents got a massive buff and they're going to be a centerpiece as agent orchestration becomes the primary paradigm of agentic coding.

Layer 3: Commands (Orchestration)

You can see here we have a couple prompts. So let's get into the third layer of this stack which is the actual custom commands. And I'm calling this the orchestration layer.

Once you have the skill and once you stack agents on top of your workflow, I think the next thing you're going to want to go for is a command — a custom command, also just known as a reusable prompt. You can see we have that UI review prompt that ran. This is where things get a little more interesting, a little more complex.

UI review fires off parallel story validation. We have stories glob. So, we have a bunch of variables set up here. If we open up AI review, you can see we have a single simple user story for Hacker News. And if we open this up, you can see this very simple file format that is effectively a user story for your application. It has the name, the URL your agent's going to visit, and then the actual workflow.

The true purpose of these workflows is, you know, you copy all these, you go localhost your page and then your agent validates against that specific page. This is a very agent-first approach to testing.

This isn't just a random skill. I think of skills as low-level capabilities that you give your agent. After you have that, it's up to you to compose it into something useful and valuable in a repeat scalable way. And a great way to do that is by building out sub agents so that you can scale and then commands which gives you that real power, that real control.

UI Review: Team Orchestration

In UI review, this is our orchestration prompt. We have purpose, variables, code structure, instructions, workflow, report. What this does is it's going to create an agent team. We are leveraging the new orchestration feature coming out of Claude Code. You can create teams of agents that work toward a common goal. In this case, we're creating a team that does UI review.

The workflow: discover all the UIs and set up the output directory. Spawn our agents — this is a team of agents and we're actually breaking down how to prompt each agent. We're teaching our primary agent or the orchestrator agent how to prompt the sub agents. We're being very explicit here so you can be very detailed with the results you're getting out of your sub agents. Then collect — after every teammate finishes, they're going to ping back via the task list. And then clean up and report.

We're actually using this powerful UI review prompt as a consistent way to test our UI over and over and all we have to do is just activate this prompt and our entire UI gets tested by agents.

Higher-Order Prompts

The Amazon add-to-cart workflow uses a pattern I call a "hop" — a higher order prompt. Think of this like a function that takes a function as a parameter. This is exactly what this does. Argument one actually takes another prompt as a parameter. Why? Because we want to wrap that prompt that runs in a very consistent workflow.

The consistent pieces go in the higher order prompt and then the details — the steps that you want to run — go in the lower order prompt. At any point in time, we can run the higher order prompt to automate and then pass in the workflow.

Layer 4: Just Files (Reusability)

After you have all these different ways to execute with your agent, you're going to want a repeat single place to call all these tools. And that's what you saw in the beginning. J is aliased to just — a powerful and simple command runner.

Open up the just file, you'll see all the commands we just ran, all the permutations of how we want to execute and kick off our Claude agent. And you'll see all the workflows with variables we can pass in to overwrite them. This allows you and your team and other agents to build repeat solutions and then to quickly access them.

So we have skills (capability) at the bottom. We have sub agents to scale. You can give each one of your sub agents a different skill or the same skill. And then you want commands to orchestrate. And then right at the top I use just files for reusability.

Key Takeaways

This is an important thing to mention. Code is fully commoditized. What is an advantage is your specific solution to the problem you're solving. Don't outsource learning how to build with the most important technology of our lifetime — agents. If you're outsourcing your skills, your agents, your prompts — how will you improve? How will you build unique systems?

If you can't look at a library, pull it into a skill, build it on your own, scale it with some sub agents, and then orchestrate it with a prompt — you will constantly be limited. This is one of the big differences between vibe coders and agentic engineers. Agentic engineers know what their agents are doing, and they know it so well, they don't have to look. Vibe coders don't know, and they don't look. If you master the agent, you will master knowledge work.

Specialization matters more than ever. Specialization combined with scale and agent orchestration is where the big nugget of gold is right now in the age of agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment