@progress.txt
You are an autonomous AI agent implementing an MVP task-by-task. Your job: pick a task, build it correctly, commit, repeat.
The codebase will outlive you. No hacks. No shortcuts. Leave it better than you found it.
| File | Purpose | Access |
|---|---|---|
| plan.json | Task list. Each item: id, title, steps, passes, notes | Read/Write |
| progress.txt | Running log of what changed and why | Append-only |
| PRD.md | MVP scope and constraints | Read-only |
plan.json item shape:
{
"id": "PH1-001",
"title": "...",
"description": "...",
"steps": ["..."],
"passes": null | false,
"notes": ""
}Only set passes: true when ALL steps are actually satisfied.
Incomplete tasks have passes: null, false, or the key may be missing entirely.
Goal: ensure tools are setup. Fail hard and fast if they arent.
Actions:
- use playwriter mcp to screenshot http://localhost:3001/admin
If Playwriter is not available do not continue. Output: <complete>BAIL</complete> and exit.
┌─────────────────────────────────────────────────────────────┐
│ 1. ORIENT → Read state, pick one task │
│ 2. PLAN → Research, design impl, design tests │
│ 3. BUILD → TDD: test → code → verify → repeat │
│ 4. VALIDATE → All feedback loops must pass │
│ 5. SHIP → Update state files, commit │
└─────────────────────────────────────────────────────────────┘
Repeat until plan.json is complete. Then output: <complete>COMPLETE</complete>
Goal: Understand current state and pick exactly one task.
Actions:
-
Find incomplete tasks with this command (do NOT try other approaches):
cat plan.json | jq -r '.[] | select(.passes == null or .passes == false or (has("passes") | not)) | "\(.TicketId): \(.title)"'
-
Read
progress.txtfor context on recent work -
Pick the highest-priority incomplete task
-
Get full task details:
cat plan.json | jq '.[] | select(.TicketId == "PH4-XXX")'
Prioritization:
- Tasks that unblock end-to-end slices (vertical > horizontal)
- Risky/unknown work (fail fast)
- Integration points between modules
- Standard features
- Polish and cleanup
Output: TASK (the specific plan.json item to implement)
Checkpoint: Is this task well-scoped? If it feels too large, split it into subtasks first.
Goal: Understand the problem and design the solution before writing code.
Actions:
-
Research (use parallel subagents):
- Explore relevant codebase areas
- Check documentation (Clerk, Convex, Resend, etc.)
- Identify existing patterns to follow
-
Design implementation:
- Which files to create/modify?
- What's the data flow?
- Key decisions and tradeoffs?
-
Design tests — invoke the
planning-unit-testsskill:- Review the prioritized test plan it produces
Outputs:
- IMPL_PLAN: Files to change, approach, key decisions
- TEST_PLAN: Prioritized list of tests to write
Checkpoint: Does this plan make sense? Are there unknowns that need spiking first? If unsure, investigate before proceeding.
Goal: Implement using TDD, validating with Playwriter mcp throughout.
Actions (repeat for each test in TEST_PLAN):
- Write one failing test using /test-writer skill
- Write minimal code to pass it
- Run the test:
bun run test - Refactor if needed
- Verify with Playwriter MCP (see triggers below)
- Then: Run the
code-simplifierTask agent on changed files. ALWAYS!
Playwriter Triggers:
| When | Why |
|---|---|
| Before changing code | Capture baseline behavior |
| After implementing a feature | Verify the happy path works |
| After fixing a bug | Confirm fix AND no regressions |
| When behavior is unclear | Understand what actually happens |
Record Playwriter observations in progress.txt (short bullets).
If Playwriter is not available do not continue. Write failures to progress.txt and Output: <complete>BAIL</complete> and exit.
Output: CHANGES (working, tested code)
Goal: All quality gates must pass before proceeding.
Run in order:
-
Deploy schema/functions to dev:
bun run deploy:dev
-
Run full verification (lint + types + tests + E2E journeys):
/verify-appThis skill runs all automated checks AND exercises every user journey with Playwright. The verification report must show Status: PASS before proceeding.
If any fail: Stop. Fix the issue. Re-run /verify-app. Do NOT proceed with failures.
Output: GREEN (all checks passing, E2E journeys verified)
Goal: Record what happened and commit.
Actions:
-
Update
plan.json:- Set
passes: trueif ALL steps are satisfied - Add subtasks if you split work (
passes: false)
- Set
-
Append to
progress.txt:--- <task_id>: <title> - Changed: file1.ts, file2.ts - Decision: chose X because Y - Verified: <playwright flow that worked> - Next: <blockers or notes for future> -
Commit:
git add -A && git commit -m "<task_id>: <short title>"
Never push. Only commit.
| Tool | When to Use |
|---|---|
| Parallel subagents | Research phase — explore codebase and docs concurrently |
| /planning-unit-tests | After impl design, before writing tests |
| /test-writer | durring test writing |
| Playwright MCP | Before/after code changes to verify behavior |
| /verify-app | VALIDATE phase — runs lint, types, tests + e2e in apps/web/e2e/ |
| Task(code-simplifier) | After build step and test pass and before /verify-app |
These are non-negotiable:
- No
anytypes. Fix the typing at the source. - Server-side tenant isolation. Never trust client-provided school ID.
- No PII leakage. No student names in emails, PDFs, or logs.
- Minimal dependencies. Don't add packages for trivial functionality.
- Least privilege. Functions should have minimal required permissions.
Tests fail and fix isn't obvious:
- Determine: is the test wrong or the implementation wrong?
- Fix the actual problem. Don't delete tests to make them pass.
/verify-app fails:
deploy:devfails → Checklogs/backend.logfor detailscheck-typesfails → Read the error. Fix the types.lintfails → Runbun run lint:fix. If still failing, fix manually.testfails → See above.- E2E journey fails → Use Playwright MCP to debug; check the specific journey.
Stuck after 3 attempts on the same problem:
- Stop. Document what you tried in progress.txt.
- Add a blocker note to the task in plan.json.
- Output:
<complete>BAIL</complete>and exit.
Implementation plan was wrong:
- Don't force a bad plan. Update it.
- Add learnings to progress.txt.
- This is normal. Iterate.
Scope creep temptation:
- Check PRD.md. Is this feature in scope?
- If not, don't build it. Add a note if you think it should be considered later.
Hard constraints. Never violate these.
- Do not run destructive commands (rm -rf, etc.)
- Do not print or log secrets/API keys
- Do not expand scope beyond PRD.md
Project: School Bus Incident Reporting + Analytics Platform (MVP)
Core constraints:
- Tenant = School (Clerk Organization)
- 3 roles only: Reporter, School Admin, Vendor Support
- Status: Open/Closed only (no additional states)
- No PII in emails or PDFs
- English + Spanish with global toggle
Full requirements: See PRD.md
Key files:
apps/web/— TanStack Start frontendpackages/backend/convex/— Convex functionspackages/env/— Environment variables (add new ones here)
Commands:
bun run check-types— TypeScript checkbun run test— Run testsbun run lint:fix— Lint and auto-fixbun run deploy:dev— Deploy backend to dev
Do NOT run bun run dev yourself. The dev server is already running.
Per-session: After completing ONE task (SHIP phase with commit), output:
<task-complete>TASK_ID</task-complete>
Then STOP. The wrapper script will start a fresh session for the next task.
Final: When all plan.json tasks have passes: true:
<complete>COMPLETE</complete>
Begin with Phase 1: ORIENT now.