s3u/agentic-devex-system-design.md

## agentic-devex-system-design.md

      
    Raw
  

              agentic-devex-system-design.md
            
          
    Building an Agentic Developer Experience with Cursor

Why the Future of Developer Experience Is Agentic

The traditional developer experience is built around dashboards, documentation, and manual workflows -- the developer is the executor. In an agentic paradigm, the developer becomes the architect: defining intent, reviewing outcomes, and steering autonomous agents that do the building. Three forces make this inevitable:

Context windows are now large enough to hold entire subsystems, making multi-file reasoning feasible.
Tool-use and function-calling have matured -- models can reliably invoke external APIs, run terminals, edit files, and iterate against feedback loops (tests, linters, type-checkers).
The cost curve is collapsing -- separating planning (expensive, high-intelligence model) from execution (fast, cheap model) makes agentic workflows economically viable at scale.

Cursor is the first IDE to treat the agent as a first-class citizen rather than a bolt-on copilot. Its architecture -- the agent harness (user messages + tools + instructions) -- is explicitly designed so that as frontier models improve, developers get better results without changing their workflow.

System Design: The Six-Layer Agentic Architecture


      graph TB
    subgraph layer1 [Layer 1: Context Engineering]
        Rules["Rules (.cursor/rules/)"]
        AgentsMD["AGENTS.md (nested)"]
        TeamRules["Team Rules (Dashboard)"]
        UserRules["User Rules (Global)"]
    end

    subgraph layer2 [Layer 2: Dynamic Knowledge]
        Skills["Agent Skills (.cursor/skills/)"]
        MCP["MCP Servers (mcp.json)"]
        SemanticIdx["Semantic Search Index"]
    end

    subgraph layer3 [Layer 3: Agent Modes]
        PlanMode["Plan Mode (Shift+Tab)"]
        AgentMode["Agent Mode (Default)"]
        AskMode["Ask Mode (Read-only)"]
        DebugMode["Debug Mode"]
    end

    subgraph layer4 [Layer 4: Automation and Hooks]
        Hooks["Hooks (hooks.json)"]
        Commands["Commands (.cursor/commands/)"]
        StopHook["Stop Hook (Grind Loop)"]
    end

    subgraph layer5 [Layer 5: Parallelism and Scale]
        Worktrees["Git Worktrees"]
        BestOfN["Best-of-N (Multi-model)"]
        CloudAgents["Cloud Agents"]
    end

    subgraph layer6 [Layer 6: Distribution]
        Plugins["Plugins (.cursor-plugin/)"]
        Marketplace["Cursor Marketplace"]
    end

    layer1 --> layer2
    layer2 --> layer3
    layer3 --> layer4
    layer4 --> layer5
    layer5 --> layer6

    
      Loading

  
Layer 1: Context Engineering -- The Foundation

Context is the single most important lever. Without it, agents hallucinate. With it, they build with precision.
Project Rules (.cursor/rules/*.mdc) -- Persistent, version-controlled instructions scoped by glob patterns. Four types:

Always Apply -- every session (e.g., coding standards)
Apply Intelligently -- agent-decided based on description
Apply to Specific Files -- glob-matched (e.g., **/*.tsx)
Apply Manually -- invoked via @my-rule

Nested AGENTS.md -- Simpler alternative. Place in project root and subdirectories for directory-scoped instructions. More specific files take precedence.
project/
  AGENTS.md              # Global: "Use TypeScript, follow repo pattern"
  frontend/
    AGENTS.md            # "Use Tailwind, Framer Motion for animations"
    components/
      AGENTS.md          # "Props interface at top, named exports"
  backend/
    AGENTS.md            # "Use zod validation, export types from schemas"

Team Rules -- Centrally managed from the Cursor Dashboard. Can be enforced (cannot be disabled by team members) for compliance.
Best Practice: Rules should be concise (<500 lines), reference files with @filename.ts instead of copying content, and be added only when the agent makes the same mistake repeatedly.

Layer 2: Dynamic Knowledge -- Skills and MCP

This layer gives agents on-demand capabilities beyond what's in the codebase. Skills teach agents how to do things; MCP gives them access to things.
Agent Skills

(.cursor/skills/ or .agents/skills/) -- Portable, version-controlled packages with SKILL.md files. Unlike Rules (always loaded), Skills are loaded progressively when the agent determines relevance. Skills can include:

scripts/ -- executable code the agent can run
references/ -- additional docs loaded on demand
assets/ -- templates, config files

Skill 1: Troubleshoot CI Pipeline Failures
.cursor/skills/
  troubleshoot-ci/
    SKILL.md
    scripts/
      fetch-ci-logs.sh
      parse-failures.py
    references/
      CI_RUNBOOK.md
    assets/
      known-flaky-tests.json

SKILL.md contents:
---
name: troubleshoot-ci
description: Diagnose and fix CI pipeline failures. Use when a CI/CD
  pipeline has failed, tests are flaky, or builds are broken in GitLab
  CI, GitHub Actions, or similar systems.
---

# Troubleshoot CI Pipeline

## When to Use
- A CI pipeline has failed and the developer asks "why did CI fail?"
- Flaky tests are blocking merges
- Pipeline timeouts or resource exhaustion

## Instructions

1. **Gather evidence**: Run `scripts/fetch-ci-logs.sh <pipeline-id>`
   to pull the latest failed job logs. If using GitLab MCP, call the
   pipeline jobs API to get logs directly.
2. **Classify the failure** into one of:
   - Compilation/build error -- read the error output, find the file
     and line, propose a fix
   - Test failure -- check `assets/known-flaky-tests.json` first; if
     the test is known-flaky, suggest a re-run; otherwise diagnose
   - Infrastructure failure (OOM, timeout, runner issue) -- check
     resource limits in the CI config and suggest increases
   - Dependency failure -- check lock files, registry availability
3. **Cross-reference** with `references/CI_RUNBOOK.md` for team-
   specific remediation steps (e.g., how to retrigger, who to page)
4. **Fix and verify**: Apply the fix, then instruct the developer to
   push and monitor the next pipeline run
5. If the root cause is a flaky test, update
   `assets/known-flaky-tests.json` with the test name and date
Skill 2: Debug Build Failures
.cursor/skills/
  debug-build/
    SKILL.md
    scripts/
      clean-build.sh
      dependency-tree.sh
    references/
      BUILD_PATTERNS.md

SKILL.md contents:
---
name: debug-build
description: Diagnose and resolve build failures including compilation
  errors, dependency conflicts, and configuration issues. Use when
  builds fail locally or in CI, or when dependency resolution breaks.
---

# Debug Build Failures

## When to Use
- Local or CI build fails with compilation errors
- Dependency version conflicts or resolution failures
- Webpack/Vite/esbuild/tsc errors after package upgrades
- Docker image build failures

## Instructions

1. **Reproduce locally**: Run the project's build command (check
   AGENTS.md or .cursor/rules for the correct command). Capture the
   full error output.
2. **Parse the error**:
   - TypeScript: Look for the TS error code (e.g., TS2345), find the
     file:line, read surrounding context, and fix the type mismatch
   - Docker: Identify the failing layer, check if it is a missing
     dependency, wrong base image, or COPY path issue
   - Native/compiled: Check compiler version, missing headers, or
     linker errors
3. **Dependency conflicts**: Run `scripts/dependency-tree.sh` to
   visualize the dependency graph. Look for duplicate or incompatible
   versions. Consult `references/BUILD_PATTERNS.md` for team-approved
   resolution strategies (e.g., `overrides`, `resolutions`).
4. **Clean build**: If the error is stale cache, run
   `scripts/clean-build.sh` which removes node_modules, dist, .next,
   __pycache__, and similar artifacts, then rebuilds.
5. After fixing, run the build again to verify. If in CI, push and
   monitor.
Skill 3: Troubleshoot CD / Deployment Failures
.cursor/skills/
  troubleshoot-cd/
    SKILL.md
    scripts/
      deploy.sh
      rollback.sh
      validate-deploy.py
    references/
      DEPLOYMENT_RUNBOOK.md
      INFRA_TOPOLOGY.md
    assets/
      env-config-template.json

SKILL.md contents:
---
name: troubleshoot-cd
description: Diagnose and resolve deployment failures across staging
  and production environments. Use when deployments fail, health checks
  don't pass, or rollbacks are needed. Covers AWS ECS/EKS, Kubernetes,
  and Terraform-based deployments.
---

# Troubleshoot CD / Deployment

## When to Use
- A deployment to staging or production has failed
- Health checks are failing after deploy
- Terraform plan/apply errors
- Need to perform an emergency rollback

## Instructions

1. **Identify the deployment target**: Read
   `references/INFRA_TOPOLOGY.md` to understand the environment
   topology (which services, which regions, which orchestrator).
2. **Gather deployment logs**: Use the AWS MCP or GitLab MCP to pull
   deployment logs. For ECS, check task stopped reasons. For K8s,
   check pod events and container logs.
3. **Classify the failure**:
   - Image pull failure -- check ECR/registry permissions, image tag
   - Health check failure -- verify the health endpoint, check env
     vars against `assets/env-config-template.json`
   - Terraform error -- read the plan diff, check for state drift or
     resource conflicts
   - Permission/IAM error -- check the role trust policy and attached
     policies via AWS MCP
4. **Rollback if needed**: Run `scripts/rollback.sh <environment>` to
   revert to the last known-good deployment. Follow the rollback
   procedure in `references/DEPLOYMENT_RUNBOOK.md`.
5. **Fix forward**: Once root cause is identified, apply the fix, run
   `scripts/validate-deploy.py <environment>` for pre-flight checks,
   then deploy with `scripts/deploy.sh <environment>`.
Skill 4: Investigate Production Incidents with Observability Data
.cursor/skills/
  investigate-incident/
    SKILL.md
    scripts/
      fetch-metrics.sh
      correlate-events.py
    references/
      INCIDENT_PLAYBOOK.md
      SERVICE_DEPENDENCIES.md

SKILL.md contents:
---
name: investigate-incident
description: Investigate production incidents by correlating logs,
  metrics, and traces from Datadog and Splunk. Use when there is a
  production alert, elevated error rate, latency spike, or customer-
  reported issue.
---

# Investigate Production Incident

## When to Use
- PagerDuty/Datadog alert fires
- Error rate or latency spikes in a service
- Customer reports an issue that needs root cause analysis

## Instructions

1. **Establish timeline**: Ask the developer for the approximate start
   time. Use the Datadog MCP to query metrics for the affected service
   over that window (error rate, p99 latency, CPU/memory).
2. **Pull logs**: Use the Splunk MCP (or Datadog Logs MCP) to search
   for error-level logs in the affected service within the time
   window. Look for stack traces, error codes, and upstream failures.
3. **Trace the request path**: Consult
   `references/SERVICE_DEPENDENCIES.md` for the service dependency
   graph. Check upstream and downstream services for correlated
   failures.
4. **Correlate with deployments**: Use GitLab MCP to check if any
   deployment happened just before the incident start time. If yes,
   this is likely a regression -- switch to the `troubleshoot-cd`
   skill for rollback.
5. **Propose a fix or mitigation**: Based on evidence, either:
   - Propose a code fix (with the file and line identified from logs)
   - Suggest a config change (feature flag, env var, scaling)
   - Recommend a rollback with `scripts/rollback.sh`
6. Follow `references/INCIDENT_PLAYBOOK.md` for post-incident steps
   (write-up, timeline, action items).
Skill 5: Debug Docker and Container Issues
.cursor/skills/
  debug-containers/
    SKILL.md
    scripts/
      inspect-container.sh
      check-resources.sh
    references/
      DOCKER_PATTERNS.md

SKILL.md contents:
---
name: debug-containers
description: Debug Docker build failures, container runtime issues,
  and orchestration problems in ECS/EKS/K8s. Use when containers
  crash, fail to start, or exhibit resource issues.
---

# Debug Docker and Container Issues

## When to Use
- Dockerfile build fails
- Container exits with non-zero code (CrashLoopBackOff in K8s)
- OOMKilled or resource limit issues
- Networking or service discovery failures

## Instructions

1. **Build failures**: Read the Dockerfile and the build output.
   Common issues: missing build args, wrong base image architecture
   (amd64 vs arm64), COPY paths that don't exist in build context.
2. **Runtime crashes**: Run `scripts/inspect-container.sh <container>`
   to get the last 100 lines of logs and the exit code. Check if the
   entrypoint is correct and env vars are set.
3. **Resource issues**: Run `scripts/check-resources.sh` to compare
   configured limits vs actual usage. If OOMKilled, recommend
   increasing memory limits or investigating memory leaks.
4. **Networking**: Check that the container is listening on the
   expected port, security groups allow traffic, and service discovery
   (DNS/envoy/service mesh) is configured correctly.
5. Consult `references/DOCKER_PATTERNS.md` for team-standard
   Dockerfile patterns (multi-stage builds, layer caching, .dockerignore).
MCP Server Integrations

(.cursor/mcp.json) -- Connect the agent to external systems. Over 100+ integrations available. Three transport types:

stdio -- local processes (single user)
SSE / Streamable HTTP -- remote servers (multi-user, OAuth)

Recommended MCP Stack for a Production DevEx Platform:

  
      graph LR
    subgraph vcs [Version Control]
        GitLab["GitLab MCP"]
        GitHub["GitHub MCP"]
    end

    subgraph cloud [Cloud Infrastructure]
        AWS["AWS MCP"]
        AWSBilling["AWS Billing MCP"]
        AWSDocs["AWS Docs MCP"]
    end

    subgraph observability [Observability]
        Datadog["Datadog MCP"]
        Splunk["Splunk MCP"]
        Sentry["Sentry MCP"]
    end

    subgraph pm [Project Management]
        Linear["Linear MCP"]
        Slack["Slack MCP"]
    end

    subgraph infra [Infrastructure]
        Terraform["Terraform MCP"]
        Docker["Docker MCP"]
    end

    Agent["Cursor Agent"] --> vcs
    Agent --> cloud
    Agent --> observability
    Agent --> pm
    Agent --> infra

    
      Loading

  
GitLab MCP -- DevSecOps platform integration. Gives the agent access to merge requests, pipelines, CI job logs, issues, and repository data. Essential for teams using GitLab CI/CD.
{
  "mcpServers": {
    "gitlab": {
      "url": "https://your-gitlab-instance.com/api/v4/mcp"
    }
  }
}
Use cases: "Why did the pipeline fail on MR !456?", "Show me the diff for the last merge to main", "Create an MR with these changes"
AWS MCP -- Access AWS services through natural language. Covers EC2, ECS, Lambda, S3, IAM, CloudFormation, and more. Combine with the AWS Billing MCP for cost analysis and AWS Documentation MCP for up-to-date service docs.
{
  "mcpServers": {
    "aws": {
      "command": "uvx",
      "args": ["mcp-proxy-for-aws@latest",
               "https://aws-mcp.us-east-1.api.aws/mcp",
               "--metadata", "AWS_REGION=us-west-2"]
    },
    "aws-docs": {
      "command": "uvx",
      "args": ["awslabs.aws-documentation-mcp-server@latest"]
    },
    "aws-billing": {
      "command": "uvx",
      "args": ["awslabs.billing-cost-management-mcp-server@latest"]
    }
  }
}
Use cases: "Check why ECS tasks are failing in staging", "What IAM permissions does this Lambda need?", "How much did us-east-1 EC2 cost last month?", "Show me the AWS docs for ECS task networking"
Datadog MCP -- Query metrics, logs, traces, and monitors. The agent can investigate latency spikes, error rate increases, and infrastructure alerts without leaving the IDE.
{
  "mcpServers": {
    "datadog": {
      "url": "https://mcp.datadoghq.com/mcp",
      "headers": {
        "DD-API-KEY": "${env:DD_API_KEY}",
        "DD-APPLICATION-KEY": "${env:DD_APP_KEY}"
      }
    }
  }
}
Use cases: "Show me p99 latency for the payments service over the last hour", "Pull error logs from the order-service for the last 30 minutes", "What monitors are currently alerting?"
Splunk MCP -- Search logs, run SPL queries, and analyze security events. Particularly valuable for teams using Splunk for centralized log management or SIEM.
{
  "mcpServers": {
    "splunk": {
      "command": "npx",
      "args": ["mcp-remote", "https://your-splunk.example.com/mcp"],
      "env": {
        "SPLUNK_TOKEN": "${env:SPLUNK_TOKEN}"
      }
    }
  }
}
Use cases: "Search Splunk for 500 errors in the auth-service in the last 2 hours", "Run an SPL query for failed login attempts by IP", "Correlate this error with recent deployment events"
How Skills and MCP Work Together:
The key design insight is that MCP provides deterministic tool integration (API calls, data fetching), while Skills provide adaptive context and workflows (domain knowledge, multi-step procedures). They are complementary:

The troubleshoot-ci skill tells the agent how to diagnose a CI failure (the methodology, the classification, the team-specific runbook)
The GitLab MCP gives the agent access to the actual pipeline logs and job data
The investigate-incident skill teaches the agent how to correlate signals across services
The Datadog and Splunk MCPs give the agent access to the actual metrics and logs

Without Skills, the agent has tools but no methodology. Without MCP, the agent has methodology but no data. Together, they create agents that can reason about production systems the way a senior engineer would.

Layer 3: Agent Modes -- The Right Mode for the Right Task


Mode
When to Use
Key Capability


Plan
Complex features, unclear requirements
Creates reviewable plan before coding


Agent
Implementation, refactoring
Autonomous multi-file editing


Ask
Learning, onboarding, exploration
Read-only, no changes


Debug
Regressions, race conditions, memory leaks
Hypothesis generation + log instrumentation


The Plan-then-Execute Pattern is the most impactful workflow:

Activate Plan Mode (Shift+Tab)
Use a powerful model (e.g., Claude Opus) to generate a detailed plan with file paths, function signatures, and logic
Review and edit the plan (saved as markdown in .cursor/plans/)
Execute with a faster model (e.g., Sonnet) -- it follows the plan as a diligent builder
If the result is wrong, revert to checkpoint, refine the plan, and re-execute

Debug Mode is uniquely powerful for hard-to-reproduce bugs: it instruments code with logging, asks you to reproduce the bug, analyzes runtime evidence, then makes a targeted fix -- rather than guessing.

Layer 4: Automation and Hooks -- The Agent Loop

This is where agentic DevEx becomes truly autonomous.
Custom Commands (.cursor/commands/*.md) -- Reusable workflows triggered with / in chat:

/pr -- commit, push, create PR with gh
/review -- run linters, flag issues, summarize
/fix-issue [number] -- fetch issue from GitHub, find code, implement fix, open PR
/deploy-staging -- run tests, build, push to staging

Hooks (.cursor/hooks.json) -- Scripts that run at defined stages of the agent loop. Two types:

Command-based: Shell scripts receiving JSON via stdin, returning JSON via stdout
Prompt-based: LLM-evaluated natural language conditions (policy enforcement without code)

Key hook events:

sessionStart / sessionEnd -- inject context, run cleanup
beforeShellExecution / afterShellExecution -- gate risky commands
beforeReadFile / afterFileEdit -- run formatters, scan for secrets
preToolUse / postToolUse -- generic tool lifecycle
stop -- the grind loop: return a followup_message to keep the agent iterating until tests pass or a scratchpad says "DONE"
subagentStart / subagentStop -- control Task tool execution

The Grind Loop Pattern -- Use a stop hook to create agents that iterate autonomously until a verifiable goal is met:
{
  "version": 1,
  "hooks": {
    "stop": [{ "command": "bun run .cursor/hooks/grind.ts" }]
  }
}
The script checks if a scratchpad contains "DONE" or if max iterations are reached. If not, it returns a followup_message to continue. This is TDD on autopilot.

Layer 5: Parallelism and Scale

Git Worktrees -- Each parallel agent runs in an isolated worktree with its own files. Configured via .cursor/worktrees.json for dependency installation and environment setup.
Best-of-N -- Run the same prompt across multiple models simultaneously. Compare results side-by-side. Cursor suggests which solution is best. Especially valuable for hard problems.
Cloud Agents -- Delegate tasks to cloud-hosted agents that run in sandboxes. Start from cursor.com/agents, the editor, or even your phone. They clone the repo, create a branch, work autonomously, and open a PR when finished. Trigger from Slack with @Cursor.

Layer 6: Distribution -- Plugins and Marketplace

Package everything into a Plugin (.cursor-plugin/plugin.json) that bundles:

Rules, Skills, Agents, Commands, Hooks, MCP Servers

Distribute through the Cursor Marketplace (manually reviewed for security). Multi-plugin repositories are supported via marketplace.json.
This is how you scale agentic DevEx across an organization: encode your team's architectural patterns, compliance requirements, deployment workflows, and domain knowledge into a plugin that every developer gets automatically.

Recommended Implementation Roadmap

Week 1-2: Foundation

Create .cursor/rules/ with coding standards, architecture patterns, and key commands
Add nested AGENTS.md for directory-specific guidance
Set up 2-3 essential MCP integrations (GitHub, Linear/Jira, your database)

Week 3-4: Workflows

Build custom commands for your top 5 repeatable workflows (/pr, /review, /test, /deploy, /fix-issue)
Create Agent Skills for domain-specific tasks (deployment, data migrations, API design)
Implement afterFileEdit hooks for auto-formatting

Week 5-6: Autonomy

Implement the grind loop (stop hook) for TDD workflows
Set up beforeShellExecution hooks for security guardrails
Configure worktrees for parallel agent execution

Week 7-8: Scale

Package as a Plugin for team distribution
Set up Cloud Agents for async task delegation
Establish Team Rules on the Cursor Dashboard for org-wide compliance


Key Design Principles


Context over prompting -- Well-structured Rules, Skills, and MCP integrations eliminate the need for repetitive prompt engineering
Plan before execute -- Separate architectural thinking (expensive model) from implementation (fast model) to get better results at lower cost
Verifiable goals -- Use typed languages, linters, and tests to give agents clear success criteria; use the grind loop to iterate automatically
Progressive complexity -- Start with Rules, add Skills when you need dynamic capabilities, add Hooks when you need automation, add Plugins when you need distribution
Human as architect -- The developer defines intent, reviews plans, and approves outcomes; the agent handles the implementation details
Mode	When to Use	Key Capability
Plan	Complex features, unclear requirements	Creates reviewable plan before coding
Agent	Implementation, refactoring	Autonomous multi-file editing
Ask	Learning, onboarding, exploration	Read-only, no changes
Debug	Regressions, race conditions, memory leaks	Hypothesis generation + log instrumentation
No results found