Skip to content

Instantly share code, notes, and snippets.

@oneryalcin
Created October 13, 2025 11:35
Show Gist options
  • Select an option

  • Save oneryalcin/5f79201e5a5e6decb4024f12d2065afc to your computer and use it in GitHub Desktop.

Select an option

Save oneryalcin/5f79201e5a5e6decb4024f12d2065afc to your computer and use it in GitHub Desktop.
Skills Pattern: Just-In-Time Context Injection for LLM Agents

Skills Pattern: Just-In-Time Context Injection for LLM Agents

Right. Focus on injection mechanics and why it works, not what skills exist.

Core Mechanics

1. Just-In-Time Context Injection

Not "here's all documentation upfront" - it's pull, not push:

Session starts → Minimal bootstrap injected
Agent encounters task → Agent searches for skill
Agent finds match → Agent reads SKILL.md
Agent uses skill → Follows instructions

Key: Context only loaded when needed. Main context stays clean.

2. SKILL.md Format = Executable Instructions

Not documentation. Not guidelines. Executable process:

---
name: Thing To Do
when_to_use: Specific trigger conditions
---

# Overview
Core principle in 1-2 sentences

# Quick Reference
Pattern matching

# Implementation
Step-by-step checklist

# Supporting files
Tools/scripts if needed

Why it works:

  • Frontmatter: Machine-parseable metadata
  • Checklist: Unambiguous steps (→ TodoWrite)
  • when_to_use: Clear trigger conditions
  • Markdown: LLM-native format, easy to read/write

3. Enforcement Through Adversarial Testing

Skills tested against realistic failure modes:

Write skill → Test on subagent → Find ways subagent skips it
→ Strengthen language → Retest → Iterate until bulletproof

Uses persuasion principles not as hack but as reliability mechanism:

  • Authority framing makes instructions stick
  • Commitment patterns force acknowledgment
  • Scarcity/pressure scenarios ensure compliance under stress

Result: Skills that survive adversarial testing actually work in practice.

4. Mandatory Compliance Pattern

Bootstrap establishes operating rules:

1. You have skills
2. Skills are discoverable (search tool)
3. If skill exists for task → YOU MUST USE IT

Not "helpful docs" - it's required procedure. Like aviation checklists.

Reinforced by:

  • EXTREMELY_IMPORTANT framing
  • Mandatory announcement ("I'm using X skill...")
  • TodoWrite tracking for checklists
  • Tested scenarios where agent tries to skip → fails test → skill strengthened

5. Search-Then-Read Pattern

Two-phase discovery:

Phase 1: Search

find-skills [pattern]  # Returns list with when_to_use

Agent matches pattern to task.

Phase 2: Read & Execute

Read skill → Announce usage → Follow checklist

Why separation matters:

  • Search is cheap (grep/metadata)
  • Read is expensive (full file in context)
  • Only read what's needed

6. Self-Improvement Loop

Meta-skill that writes skills:

Agent learns pattern → Extracts to SKILL.md → Tests on subagents
→ If passes: Add to library → Future agents use it

Genius part: System gets smarter over time. Lessons codified, not lost.

7. Subagent Delegation for Context Management

Don't pollute main context with exploration:

Need to search? → Dispatch subagent with query
→ Subagent returns synthesis only
→ Main agent uses result

Principle: Keep main context for active work. Delegate research/search.

Why This Works

Compared to RAG:

  • RAG: Retrieves chunks, agent figures out what to do
  • Skills: Retrieves instructions, agent executes

Compared to system prompts:

  • System prompt: Fixed, everything upfront, context bloat
  • Skills: Dynamic, loaded on-demand, scales indefinitely

Compared to docs:

  • Docs: Reference material, agent interprets
  • Skills: Executable procedures, agent follows

The Meta-Pattern

Skills are code for LLMs.

Not data. Not documentation. Executable procedures that:

  1. Load on-demand (JIT)
  2. Have clear triggers (when_to_use)
  3. Provide unambiguous steps (checklists)
  4. Are tested adversarially (TDD for docs)
  5. Use persuasion principles for reliability
  6. Self-improve (meta-skill writes skills)

The injection system:

  • Minimal bootstrap (establish rules)
  • Search tool (discovery)
  • SKILL.md format (standard interface)
  • Mandatory compliance (enforcement)
  • Adversarial testing (quality gate)

Result: System that scales because context cost is O(1) per task, not O(n) for all knowledge.

@oneryalcin
Copy link
Author

Two Repos vs One vs Three: Architecture Analysis

Two repos, not one. Skills separation is non-negotiable.

Critical Distinction

The repo split isn't about technical complexity—it's about evolution mechanics.

What Breaks with Single Repo

Self-improvement loop dies:

Agent learns pattern → Wants to write new skill
Single repo: Skill location = plugin directory (code territory)
Multi-repo: Skill location = ~/.config/my-agent/skills/ (content territory)

When agent writes to plugin directory:

  • Mixes executable code with content
  • User edits = risk breaking plugin
  • Updates require plugin reinstall
  • Can't fork skills independently
  • Version-locking (skills tied to plugin version)

The pattern REQUIRES skills to evolve independently from plugin code.

What Each Repo Does

Plugin repo (code):

  • Bootstrap mechanism (session-start.sh)
  • Discovery tool (find-skill)
  • Changes: Rare, by developer
  • Size: ~10 files, stays small

Skills repo (content):

  • SKILL.md files
  • Changes: Constant, by agent + users
  • Size: Grows organically (10 → 50+ files)
  • Editable without plugin reinstall

Marketplace repo (catalog):

  • Just pointer
  • Changes: When adding plugins
  • Size: Tiny (1 JSON file)

The Real Question: Two or Three?

Three repos makes sense if:

  • Building marketplace with multiple plugins
  • Teaching marketplace pattern
  • Planning to aggregate community plugins

Two repos makes sense if:

  • Personal agent (just one plugin)
  • Marketplace abstraction overkill
  • Want simplicity without breaking core mechanics

Two-Repo Pattern (Recommended)

Recommended compromise:

1. my-agent/ (plugin + marketplace.json)
   ├── .claude-plugin/
   │   ├── plugin.json
   │   └── marketplace.json      # Self-referencing
   ├── commands/
   └── hooks/

2. my-agent-skills/ (skills, separate repo)
   ├── getting-started/
   └── skills/

Installation:

/plugin marketplace add you/my-agent
/plugin install my-agent@my-agent

Works because:

  • Marketplace.json in plugin repo points to "."
  • Plugin still clones skills repo separately
  • Self-improvement intact
  • Skills forkable
  • One less repo to manage

What You CANNOT Consolidate

Skills must be separate repo. Here's why:

  1. Fork workflow:

    User forks my-agent-skills
    Edits locally at ~/.config/my-agent/skills/
    Commits, pushes to fork
    PRs back
    

    If skills in plugin repo → editing skills = editing plugin = dangerous

  2. Update workflow:

    Skills updated (new SKILL.md added)
    session-start.sh does: git pull
    New skills available, no reinstall
    

    If skills in plugin → need reinstall for every skill change

  3. Agent writes skills:

    Agent learns pattern
    Writes ~/.config/my-agent/skills/skills/new-skill/SKILL.md
    User: git commit, git push
    

    If skills in plugin directory → agent writing to code directory = bad

  4. Multiple plugins share skills:

    plugin-a uses my-agent-skills
    plugin-b also uses my-agent-skills
    One skills repo, multiple consumers
    

    Can't do this if skills embedded in plugin

Architecture Principle

Code vs Content vs Catalog:

Code (plugin):          Changes rarely, executable, security-sensitive
Content (skills):       Changes constantly, markdown, user-editable  
Catalog (marketplace):  Points to code, aggregation layer

Anti-pattern:

Everything in one:      Can't tell what's safe to edit
                        Can't update independently
                        Can't fork selectively
                        Breaking change risk high

My Recommendation

Use two repos:

  1. my-agent (plugin + embedded marketplace.json)
  2. my-agent-skills (skills)

Why:

  • Preserves self-improvement mechanics
  • Keeps skills forkable
  • Reduces complexity (vs three repos)
  • Marketplace abstraction less critical for single plugin

Skip three repos UNLESS:

  • Building marketplace for multiple plugins
  • Teaching others to build marketplaces
  • Planning plugin ecosystem

Never use one repo because:

  • Breaks self-improvement
  • Breaks fork workflow
  • Breaks update mechanism
  • Violates code/content separation

Proof Point

Superpowers uses three because obra/superpowers-marketplace is designed to grow. It's currently one plugin but architected for many.

If you're building one agent for personal use, two repos is pragmatic.

If you're building a template others will use to create marketplaces, keep three to teach the pattern properly.

Bottom line: Skills separation = non-negotiable. Marketplace separation = situational.

@oneryalcin
Copy link
Author

• Read through docs/README.md, docs/claude/hooks.md, and docs/claude/hooks-guide.md, then dug into the superpowers marketplace (sp/superpowers-marketplace/README.md), the core plugin (sp/superpowers/README.md), and sampled skills (e.g., sp/
superpowers-skills/skills/testing/test-driven-development/SKILL.md). Here’s the picture I have now:

  • Hooks are configured via settings JSON, scoped by event + matcher, and run shell commands with structured stdin; they can block/allow tool calls, inject context, or even halt the main loop using either exit codes or richer JSON outputs.
    Plugins contribute their own hooks that merge automatically, using ${CLAUDE_PLUGIN_ROOT} to stay portable.
  • Each hook event has specific timing and capabilities (PreToolUse/PostToolUse gating, UserPromptSubmit context injection/blocking, Stop/SubagentStop loop control, SessionStart context seeding, etc.), and the runtime treats hooks as parallel,
    timeout-bound processes with strict security caveats—essentially deterministic guardrails surrounding Claude’s tool usage.
  • The superpowers marketplace is a thin catalog that lets you add curated plugins; installing the superpowers@superpowers-marketplace plugin gives you the “skills engine”.
  • The superpowers plugin itself is intentionally minimal: on SessionStart it ensures a local clone of ~/.config/superpowers/skills, registers hooks, and exposes slash commands (/brainstorm, /write-plan, /execute-plan) plus helper scripts like
    find-skills and skill-run.
  • Skills live as executable instructions (Markdown with front-matter metadata) inside superpowers-skills. They read like prescriptive workflows—TDD cycles, debugging checklists, collaboration rituals—and the supporting scripts treat them as
    JIT-injectable procedures. When a skill is invoked (manually or via hooks), Claude gets a concrete, enforceable playbook rather than relying on ad-hoc prompting.

So, the core idea is: hooks are the delivery mechanism, plugins (like superpowers) curate and register them, and skills are the portable, executable instruction sets that can be injected exactly when needed—very much a JIT guidance system for
the agent. Ready to dive into your Nexus (nx) implementation when you want.

• Skill System Blueprint

  • Think of a skill as an executable procedure: YAML/MD frontmatter for routing + a compact checklist + validation hooks. Money quote insight is keeping skills O(1) in context—load only what’s needed, right when it’s needed.
  • Runtime should have three layers: discovery (index tagged metadata, expose via /find-skill or auto-matchers), injection (load skill into active context, set “must follow” flag), and enforcement (hooks/subagents watch execution to ensure
    every step completed, tests run, evidence captured).
  • Subagents become the executors: the dispatcher spawns a purpose-built subagent with the skill’s prompt/checklist, restricted tools, and maybe cached transcripts. Slash commands remain the manual override (“/skill ”), but most
    invocations should be automatic via matchers.
  • Hooks glue everything: UserPromptSubmit to auto-suggest skills, PreToolUse to block disallowed actions until required steps done, PostToolUse to capture telemetry/feedback, Stop to insist on validation before finishing.
  • Feedback loop needs first-class state: success/failure logs per skill, user annotations, diff snapshots, structured “what went wrong” data. Feed that into: (1) skill quality reports, (2) automatic TODOs for maintainers, (3) JIT warnings when
    a weak skill is about to run.
  • Marketplace/plugins stay the distribution layer: plugin repo ships commands/hooks/subagents, skills repo ships content. Opinionated tooling (cookiecutter/skil new) scaffolds the dual-repo pattern so people don’t collapse them into docs
    again.
  • Learning path: start with rule-based matching + manual tagging; graduate to vector/rule hybrid; eventually capture runtime traces to auto-synthesize new skills or patch existing ones (with human review).
  • Optional but intriguing: meta-skill agents that propose refinements after each run (“Skill tuning” subagent), and a policy layer that enforces organizational rules (who can add skills, approval workflows).

Happy to dive deeper into any axis—matching strategies, telemetry schema, or how to retrofit Nexus once you want.

• Why Skills Matter
Picture the agent as a pilot flying on instruments. When a new task appears—debugging a race condition, refactoring billing logic—we don’t want the pilot improvising every dial turn. We hand them a laminated checklist that’s proven to land
the plane safely. That laminated card is a skill: an executable, time-tested protocol the agent loads only when needed, then stows away. Skills keep reasoning sharp and predictable while avoiding the cost of stuffing every possible instruction
into the core prompt.

Core Building Blocks

  • Skill Artifact in Markdown with frontmatter metadata (tags, triggers, dependencies) plus a tight checklist and validation steps.
  • Discovery Service that indexes metadata and runtime telemetry to suggest the right skill at the right moment.
  • Injection Runtime that pulls the skill into context, spins up a purpose-built subagent, and sets “must follow” flags.
  • Enforcement Hooks listening on events (prompt submit, pre/post tool use, stop) to verify steps, block risky actions, and log outcomes.
  • Feedback Loop capturing successes, failures, and user corrections so the skill library evolves.

flowchart TD
User[User or Test]
Intent[Intake & Classification]
Discovery[Skill Discovery]
Injection[Skill Injection]
Execution[Subagent + Tools]
Enforcement[Hooks & Telemetry]
Feedback[Learning & Updates]

  User --> Intent --> Discovery --> Injection --> Execution --> Enforcement --> Feedback
  Feedback --> Discovery
  Feedback --> SkillRepo[(Skill Repo)]
  Injection --> SkillRepo

Three Runtime Layers

  1. Discovery acts like air-traffic control. It matches incoming tasks to candidate skills using metadata, tags, keywords, or lightweight embeddings. It powers /find-skill, auto-suggestions, and “no skill found” logging that spawns backlog
    items.
  2. Injection is the cargo loader. Once a skill is selected, the system copies it into a dedicated subagent prompt, scopes tool permissions, and announces required steps to the user. Slash commands remain the manual override when humans want to
    force a specific skill.
  3. Enforcement is mission control. Hooks watch every event:
    • UserPromptSubmit can insist the agent confirm the chosen skill or offer alternatives.
    • PreToolUse blocks edits until prerequisites (read docs, run tests) are checked off.
    • PostToolUse records evidence and can trigger follow-up checks.
    • Stop refuses to end the loop if validation or reporting is missing.

How Subagents Fit
Subagents behave like specialized copilots. Each skill invocation spawns or reuses a subagent configured with the skill’s instructions, constrained toolset, and fresh context. They execute the checklist end-to-end, report findings, then hand
control back. This keeps the main conversation clean and allows parallel work—documentation subagent, test-captain subagent, migration subagent—without context bloat.

Distribution & Governance
Plugins and marketplaces are the packaging and app store:

  • Plugin repo ships the runtime glue (hooks, subagents, slash commands, utilities).
  • Skills repo holds the actual procedures and can version independently.
  • Marketplace metadata simplifies install/updates and lets teams share curated skill bundles.

ASCII architecture sketch:

+-----------------------------+ +-------------------------+
| Claude Session | | Skill Telemetry Store |
| - Core prompt |<---->| - Run history |
| - Conversation context | | - Feedback/events |
| | +-----------^-------------+
| Hooks & Slash Commands | |
| (UserPrompt, Pre/Post, Stop)| (improves matching)
+--------------|--------------+ |
v |
+------+-------+ |
| Skill Dispatcher|----------------------+
| - Discovery |
| - Selection |
+------+-------+-+
| |
+---------+ +-------------+
v v
+------------+ +-------------+
| Skill Repo | | Plugin Repo |
| (checklists)| | (hooks/agents)
+------------+ +-------------+
| |
v v
+------------------+ +--------------------+
| Subagent Runtime | | Tool Permissions |
| - Skill checklist| | - Bash/Edit/etc. |
| - Evidence log | | - MCP integrations |
+------------------+ +--------------------+

End-to-End Flow

  1. Intent intake: user request or tool output is classified.
  2. Discovery: metadata + telemetry surfaces candidate skills; if none, log the gap.
  3. Injection: load skill, announce requirements, spin up subagent with limited tools.
  4. Execution: subagent runs through steps, using hooks to guard actions and gather proof.
  5. Enforcement: no step left behind—hooks validate, block risky moves, and demand reports.
  6. Feedback: success/failure plus user notes go back to telemetry, informing discovery, raising alerts for stale skills, and helping authors iterate.

Learning & Evolution

  • Short term: manual reviews of telemetry highlight missing steps, refine prompts, and spawn new skills.
  • Mid term: pattern mining auto-suggests patches or new skills when repeated failures occur.
  • Long term: meta-skills propose improvements (“Skill Tuner” agent) with human approval, closing the loop from discovery to continuous improvement.

Analogy That Sticks
Think DevOps runbooks meeting on-call dashboards. Classic runbooks exist as wiki pages; responders must remember where they live, hope they’re current, and then maybe follow them. Skill-driven agents turn each runbook into code: discoverable,
enforceable, auditable. Instead of trusting tribal knowledge or giant static prompts, we load only the one page needed, run it flawlessly, and learn from every execution.

Where To Experiment Next

  • Define a metadata schema (tags, context-signals, requires, produces) and build a searchable index.
  • Wire UserPromptSubmit hook to auto-suggest skills, with /skill slash command as fallback.
  • Prototype subagent templates that accept a skill bundle and enforce checklist execution.
  • Capture telemetry (start time, stop time, success, manual overrides) so skill quality becomes visible.
  • Establish a backlog process: every “no skill found” or recurring failure becomes a tracked task for skill authors.

With that architecture, we move from helpful heuristics to a disciplined operating system for reasoning agents—one where every critical workflow has a checklist, every checklist can evolve, and the agent can get smarter without swelling its
brain.

› ok one thing to consider, for examole matching skills and tools (skills are for example semantic data analysis / excel undertsanding..etc and tools are like specialized cli commands like fetching data, or converting pptx to pdf with
soffice..etc and tools are not skills but can be part of a skill but as tool is a tool can be independenty used depending on scenario, or custom small python scripts wherevr possible can be considered a makeshift tool). And we know that
subagent to main agent communication is also stateless. so when main agent asks a question subagent can try to do it, but may discover it cannot do it or needs more data/information other skills, as questions might reuqire multiple skills /
tools. So this creates a complexity. just considering one skill at a time probably is not going to be the most useful, but I'm a big favour of YAGNI or John Carmack approach. So if there is a way to simplify things, such as using disk as state
(simply writing to files..etc and aloowing other agents to read..etc) might be a way. Many messy thoughts, how do you approach this?

• I’ll create an accessible explanation advocating a layered approach starting simple and decoupling tools from skills, using filesystem-based state to coordinate conversation handoff and multi-skill orchestration. I plan to suggest primitives
like manifests and handshakes, sketch a phased skill composition flow from discovery to fallback, and emphasize incremental development to avoid overengineering. Including bullet sections and possibly ASCII diagrams will help illustrate the
aggregator concept and coordination mechanics.

─ Worked for 7s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• Shared Vocabulary

  • Tools = capabilities (CLI scripts, Python helpers, MCP endpoints) that do work.
  • Skills = procedures that say when and how to string capabilities together. They live at the reasoning layer and can call tools, but neither replaces the other.
  • Subagents = stateless executors of one skill run. They need a scratchpad to remember what happened between calls.

Keep It Simple, Layer by Layer

  1. Describe requirements explicitly: each skill’s frontmatter lists the tools it expects (requires-tools: ["bash:soffice", "python:excel_diff.py"]) and the data it consumes/produces (consumes: Excel, produces: Markdown report). Discovery can
    match those declarations without guessing.
  2. Mount a shared scratchpad: before launching a subagent, carve out run_logs/// with context.md, inputs/, outputs/. Subagent writes checkpoints; the main agent (or a follow-up skill) reads them. Disk becomes the coordination
    layer—cheap, inspectable, versionable.
  3. One skill at a time… but with hand-offs: each skill ends with a verdict: complete, needs-input, recommend-skill:, delegate-tool:. Discovery handles the hand-offs by picking the next skill or direct tool. That keeps orchestration
    dumb but extensible.
  4. Tool invocations stay atomic: whether called inside a skill or on demand, tools should log their inputs/outputs to the same scratchpad. Now every step has an audit trail.
  5. Prefer composition over concurrency: when a subagent realizes it can’t proceed, have it write a request.md in the scratchpad (e.g., “Need database dump”). Main agent sees it, gathers data (maybe via another skill), and re-invokes. This
    yields a simple loop instead of complex multi-skill scheduling.

ASCII routing sketch:

User Request
|
v
Skill Discovery --(no match)--> Gap log
|
v
Skill Runner (subagent, stateless)
|
writes to run_logs//<run_id>/
|
+--------+---------------------------+
| | |
tools<-+ | next skill?
| v
| completion manifest (complete / needs / delegate)
+-------------------------------> Discovery / Main agent

Why This Works

  • Tools remain general-purpose; multiple skills can reuse them.
  • Skills keep orchestration logic centralized and easy to evolve.
  • Shared disk state gives you persistence without building a message bus.
  • Hand-offs are explicit, so you avoid unbounded skill chaining until you really need it.
  • YAGNI is preserved: we’re only adding “shared folder + metadata + manifests,” which buys a lot of flexibility before you need a full orchestration engine.

From here you can add niceties—graph scoring for multi-skill plans, “skill bundles” that preload multiple procedures, richer feedback loops—but the core loop (discover → run → log → hand off) stays understandable and debuggable.

consumes produces are a bit tight no in your definition, not sure I'm happy with approach maybe good I'm not sure happy to try. Just think of a problem for example below.

From the list provided with company names and company websites.  Produce a report for each company with the following section from the website sources.

1. Business Description & Model
• 2–3 sentences on what the company does and how it makes money.
• Quote mission statements or product names verbatim from the source.
• Then a Keywords line with 5–10 strategic phrases that characterise the model.

2. Core Products
• Bullet list of main product lines (include families / platforms).

3. Core Service Offering
• Bullet list of major services (maintenance, integration, consulting, rental, etc.).

4. Typical Customers & Industries Served
• Bullet list — archetype customers, sectors, or NAICS/SIC codes where explicit.

5. Geography of Operations
• Bullet list — HQ country, major production sites, sales regions, expansion notes.

6. Additional Notes  (optional)
• Use this section to record data gaps, conflicting figures, or unclear claims.

Data sources
• Company’s public website — scrape all relevant pages, not just the homepage.
• Credible third-party sources — press/news releases, SEC filings, analyst notes, reputable media, etc.
• Prefer sources that are 24 months old or newer unless providing historical context (e.g., founding year).
• Prioritise information from the company site. Use external sources only to fill gaps or corroborate.

Based on the research, classify the companies into primary segments and sub-segments based on the excel file "Final_Industrial_Classification_Framework"

many bits there are in this request, and doing this one might reuqire a lot things, maybe starting with one company and how it goes an scale ..with paralle subagents..but also excel web search web fetch..probably persisting to disk. So real
world questions are messy and might require not a very linear approach.

• I’m thinking about how to explain a flexible architecture for complex skill orchestration that handles messy tasks by using an orchestrator skill to manage stages like data ingestion and per-company pipelines, with a focus on capabilities,
artifacts, modular tools, and disk-based state management—all explained accessibly and supported by diagrams or ASCII.

─ Worked for 12s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• Here’s how I’d tackle a messy, multi-step brief like your company research + classification example while staying true to YAGNI and keeping the system understandable.

———

Think In Roles, Not Monolithic Skills

For this job you don’t want one giant skill that does everything. Break the work into four reusable skill archetypes:

  1. Planner – reads the user request, cracks it into jobs (“one company at a time”), sets up per-company work folders.
  2. Retriever – orchestrates web search/fetch tools, saves raw pages, snapshots sources, records metadata (timestamp, URL, confidence).
  3. Extractor – turns raw pages into structured notes (business model.md, products.json, etc.) and flags gaps.
  4. Synthesizer – assembles the polished write-up + citation list, using the notes and raw files.

A separate Classifier skill can run after synthesis to map each company onto your “Final_Industrial_Classification_Framework”.

Every skill still “consumes” and “produces”—but in practice you just tag the artifacts they expect and create. For example, the Synthesizer looks for notes/business-model.md, notes/products.json, etc. Think of them as capability tags rather
than rigid types.

———

Shared Disk = Memory Bus

Give each company its own directory (work/company_slug/) with this layout:

work/
acme-industries/
inputs/
brief.md
industrial_framework.xlsx
raw/
sources/
01_homepage.html
02_press_release.html
notes/
business-model.md
products.json
missing-data.md
outputs/
report.md
citations.csv
manifest.yaml # status log, open questions, next actions

  • Subagents write everything there.
  • Other skills read from it.
  • Telemetry (timestamps, failures) gets appended to manifest.yaml.
  • If two subagents need to collaborate, they just read/write the same files.

This gives you persistence, audit trails, and makes parallelism possible without building a messaging system.

———

Orchestration Loop

At the top level, the main agent runs a tight loop:

for each company in input_excel:
ensure work/company/ exists
run skill: planner -> manifest.next_actions
while manifest.next_actions not empty:
action = pop_next_action()
dispatch to matching skill (retriever/extractor/...)
update manifest with results or follow-ups
run skill: synthesizer
run skill: classifier
aggregate all outputs into final deliverables

Skills communicate needs (e.g., “Need SEC filing after 2024-01-01”) by appending to manifest.yaml. The main loop keeps dispatching until nothing is pending.

———

Tool / Skill Matching

Tools stay modular: web-fetch, markitdown, extract_tables.py, classify_company.py, etc.

Each skill declares:

requires-tools:
- web_fetch
- markitdown
expects-artifacts:
- raw/sources/*
produces-artifacts:
- notes/business-model.md

This is just structured metadata; discovery uses it to confirm prerequisites and to pick the right skill when multiple options exist. Tools themselves can be invoked directly if necessary (e.g., manual quality check), but the default is that
skills wrap them in procedure and logging.

———

Handling “It Didn’t Work”

Because subagents are stateless, every skill run ends by updating the manifest:

status: blocked
reason: "No recent product information on company site."
request: "Run retriever with query 2024 product launch."

The main agent sees that, queues a retriever run with different parameters, and the next skill continues. You can also spawn a “feedback skill” (think meta-agent) that collects these blockers into a backlog for humans or long-term learning.

———

Scaling Up Gradually

  1. Start with single-company flow, one skill at a time, logging to disk.
  2. Add parallelism by running the loop concurrently for multiple companies (each job is isolated by directory).
  3. Introduce a “Skill Discovery” helper that chooses between general skills and company-specific variants based on tags.
  4. Layer in telemetry dashboards (e.g., reports/telemetry.csv aggregated from manifests) once you care about QA at scale.
  5. Only when you hit real limits do you consider dynamic skill graphs or more complex planners.

———

Why This Feels Reasonable

  • Skills stay focused and composable; you don’t invent a grand orchestrator on day one.
  • Tools remain reusable building blocks—whether called inside a skill or directly.
  • Disk provides cheap, resilient state across stateless subagent invocations.
  • The manifest-driven loop keeps orchestration logic explicit and auditable.
  • You can plug in learning later: manifests plus telemetry give you everything you need to discover missing skills or optimize existing ones.

Messy real-world requests become manageable: iterate per company, keep artifacts clean, enforce checklists inside skills, and let the main agent orchestrate with a very small, understandable set of primitives.

› So skills are not excel skill..etc? not like anthropic does?

• Right—when I say “skill” I don’t mean “Excel skill” or “Docx skill” in the Anthropics dump we looked at. Those are essentially long-form reference guides. I’m talking about small, executable playbooks: a focused procedure for a concrete
task (e.g., “gather company profile”, “extract product data from a page”) that can call whatever tools are appropriate—Excel helpers, web fetch scripts, classifiers—but isn’t tied to a single tool or format. The skill sits above the tools,
orchestrating when and how they’re used. Tools stay general-purpose; the skill combines them into a reusable workflow.

› woulnd't it be an overkill if the question is simply how is the weather in NY today?

• Asking “what’s the weather in New York today?” shouldn’t trigger a multi-stage research machine. Practical approaches:

  • Micro-skill / direct tool: keep a tiny “ask-weather” skill whose checklist is literally “call weather service, report result,” or let the system detect it’s a simple request and hit the weather API straight away.
  • Lightweight discovery: the classifier sees it’s a high-confidence, low-complexity query and routes to the simple skill without spinning up retrievers or manifests.
  • Cost caps: set heuristics (“no external fetchers, no subagents”) for trivial tasks; if the query later expands into something richer, the system can escalate to the heavier pipeline.

So SNIP: we only pay the orchestration overhead when the problem actually needs it.

ok there are few things interesting superpowers like brainstorm planner executor..etc but that repo is more for software engineering tasks however we can borrow many ideas from there I believe and mix it with claude type skills to solve
problems for a real life. And network effect of tools and procedures/skills will make it compoundingly important ?

• Exactly. Superpowers gives us the scaffolding—brainstorm → plan → execute, tight checklists, clear hand-offs—that works beautifully for software work. If we generalize those patterns into domain-agnostic skill templates, then layer in broader
procedure libraries (research, ops, finance, etc.) we get:

  • Executable playbooks for real-life tasks rather than just code.
  • Reusable tool library where each new script or connector makes multiple skills stronger.
  • Compounding feedback: every run adds telemetry, highlighting gaps, driving new skills. The more skills + tools we have, the more combinations unlock, and each improvement makes the whole mesh smarter.

So yes—the network effect kicks in once tools, procedures, and discovery feed each other. With disciplined encoding (short checklists, enforced verification) and shared state/logging, we can build a general-purpose “skills OS” for agents, not
just a dev helper.

› Summarize recent commits

• Yes—that’s the north star. “Skills OS” is the runtime that ties everything together:

  • Skill layer: executable playbooks with metadata, checklists, and enforcement hooks.
  • Tool layer: CLI scripts, APIs, MCP servers—portable capabilities any skill can invoke.
  • Orchestration layer: discovery, dispatch, subagent execution, shared scratchpads, telemetry, and feedback loops.

Together they let an agent load the right procedure just-in-time, run it deterministically, call the right tools, capture evidence, and learn from every run. Tools make skills powerful; skills make tools usable; orchestration keeps it all
disciplined. That’s the operating system we’re aiming to build.

I'm laying out a detailed explanation of how to build a Skills OS by leveraging core features like composability and frontmatter metadata, proposing core components such as ingestion, metadata, runtime, and discovery, and highlighting how
these enable generalization to real-world tasks and skill templates with autopilot capabilities. I plan to organize the explanation with clear structure, practical references to example files with line numbers, and actionable next steps for
extending skill design into an OS framework.

─ Worked for 20s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• The problem-solving skills in superpowers already embody many traits we need for a Skills OS:

  • Clear metadata and trigger phrases in frontmatter (name, description, when_to_use, semantic version) make each procedure easy to classify and route. sp/superpowers-skills/skills/problem-solving/simplification-cascades/SKILL.md:1
  • Compact, high-signal bodies: symptoms tables, dot graphs for dispatch, step-by-step “Process” sections. sp/superpowers-skills/skills/problem-solving/simplification-cascades/SKILL.md:19, sp/superpowers-skills/skills/problem-solving/when-
    stuck/SKILL.md:18
  • Skills reference each other, forming a lightweight network (e.g., when-stuck dispatching to simplification-cascades, collision-zone-thinking, etc.). sp/superpowers-skills/skills/problem-solving/when-stuck/SKILL.md:57

Skills OS Blueprint

  • Core primitives: keep the Markdown + YAML format but standardize a richer schema (tags, required tools, expected artifacts, validation hooks) while preserving the concise, human-readable body.
  • Discovery engine: index metadata + symptom tables so “stuck” states or user prompts map to the right skill or dispatch skill; capture gaps as TODOs.
  • Execution runtime: each skill invocation spins up a stateless subagent seeded with the skill body; it works out of a scratch directory and logs progress, evidence, and follow-up requests for the main agent.
  • Tool mesh: tools remain independent capabilities (CLI/Python/MCP). Skills list the tools they depend on; tools log their work products to the same scratch space.
  • Workflow scaffolding: allow dispatch skills (like when-stuck) to orchestrate other skills, and planner/brainstorm/executor patterns to provide initial structure even outside software engineering.
  • Telemetry & learning: every run updates a manifest (status, blockers, recommended next step). Telemetry feeds discovery (better suggestions) and informs skill authors when to refine or extend procedures.
  • Incremental adoption: start with the existing superpowers skill library, add domain-neutral templates (research, operations, finance), then grow via contributions; hook framework enforces checklists and captures feedback.

What we take from superpowers is the discipline: short, pointed procedures, symptom-driven discovery, and composability. By wrapping those patterns in shared state, discovery, and tool orchestration, we evolve them into a general-purpose
Skills OS that can serve messy real-world tasks as effectively as it serves software engineering.

Worked for 58s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• What the Superpowers Library Gives Us

  • Structured playbooks everywhere – Every skill is a short, high-signal procedure with frontmatter metadata (name, when_to_use, version) and a crisp spine (symptoms tables, flowcharts, five-step checklists). Examples span problem-solving
    (simplification cascades, collision-zone thinking), debugging (systematic debugging’s four phases), collaboration (brainstorming, dispatching-parallel-agents), testing (TDD), meta-maintenance (pulling-updates-from-skills-repository),
    research (tracing-knowledge-lineages), and the mandatory “using-skills” discipline.
  • Intent-aware dispatch – Skills like problem-solving/when-stuck and collaboration/dispatching-parallel-agents behave as routers: they map symptoms (“complexity spiraling”, “multiple independent failures”) to the specific skill to load next.
    That’s the seed of a discovery/triage layer.
  • Process enforcement baked in – Many skills insist on observable behavior (announce you’re using it, create TodoWrite tasks for every checklist item, read the whole file, follow the four debug phases). They already encode the guardrails we
    want our runtime to enforce via hooks/subagents.
  • Composable ecosystem – The library is a mesh: skills reference each other (debugging links to testing, dispatch skill calls problem-solving skills), tools live beside them (using-skills/find-skills, remembering-conversations search utility),
    and meta-skills keep the library healthy (pull updates, test skills with subagents). It’s a template for how procedures + tools can co-evolve.

Toward a Skills OS

With those ingredients we can outline the operating system we’ve been circling:

  1. Skill Layer
    • Store procedures exactly like these: Markdown body, enriched frontmatter (tags, required tools, expected artifacts, follow-up suggestions).
    • Promote the existing dispatch patterns to first-class routing skills; they’ll sit at the top of the stack for triage.
    • Encode hard requirements in a machine-readable block (e.g., must_announce, requires_todowrite, pre_read: true) so hooks can enforce them automatically.
  2. Tool Layer
    • Treat scripts and CLIs (find-skills, search-conversations, Soffice converters, custom Python utilities) as standalone capabilities advertised via metadata (provides: pdf→markdown, category: web-retrieval).
    • Give skills simple references (uses-tools: [web_fetch, markitdown, search-conversations]), but keep tools independently invokable for quick tasks.
  3. Orchestration Layer
    • Discovery & Dispatch: Use the when_to_use text, tags, and symptom tables to power a matcher that either auto-suggests or auto-selects skills. Dispatch skills become the gatekeepers when confidence is low.
    • Subagent Execution: Spin up a stateless subagent with the chosen skill, mount a per-task scratch directory (run_logs/...), and log every step/evidence file there.
    • Hooks & Enforcement: Hooks ensure the agent read the skill, created TodoWrite items, and respected mandatory announcements before allowing risky actions (PreToolUse blocks edits until checklists are logged, Stop refuses completion until
      validation evidence appears).
    • Telemetry & Feedback: Manifests in each run log capture status, blockers, and recommended follow-on skills. Aggregate telemetry (success rates, repeated blockers) powers skill quality dashboards and growth backlog.

Why Superpowers Is the Perfect Seed

  • It already has the discipline (mandatory workflows, precise language, rigid checklists) we want the OS to enforce globally.
  • Skill categories cover more than pure coding—problem solving, collaboration, research—so we can extend them into “real world” domains without inventing structure from scratch.
  • Tool scripts and meta-skills illustrate how to maintain the library itself; Skills OS can ship with the same self-healing routines (update/fork/test skills).
  • Dispatch skills show how multiple skills cooperate without building a complex planner; the OS can orchestrate via simple hand-offs, honoring YAGNI until real complexity appears.

Next Steps Toward Skills OS

  1. Enrich existing skills with explicit metadata for required tools, artifacts, and enforcement flags; build a lightweight indexer on top.
  2. Wrap find-skills/dispatch patterns into a discovery service hooked to UserPromptSubmit so the agent suggests or auto-loads skills.
  3. Implement subagent scratchpads + manifests per skill run, logging evidence, blockers, and follow-up requests on disk.
  4. Wire hooks to enforce the rules spelled out in skills (read-before-use, TodoWrite checklists, announcements, validation).
  5. Iterate in new domains (research reports, operations, analytics) following the same format, letting tools + procedures compound.

By mining and systematizing what’s already in superpowers, we’re not just borrowing ideas for code work—we’re laying the foundation for a full Skills OS where any complex workflow (software, business ops, research) becomes a deterministic,
discoverable, improvable playbook executed by an agent that actually follows the rules.

› ok and superpower also refer to mmicrosofts amplifier repo which is in ref/amplifier directory. Please do explore that library throughly and do a very good udnertsanding and see if there are bits of information that might be helpful to us

• Explored
└ List amplifier
Read README.md, AMPLIFIER_VISION.md, AGENTS.md

• Ran rg -n "recipe" -g"*.md" ref/amplifier | head
└ ref/amplifier/README.md:328:/ultrathink-task I'm new to the concepts of "metacognitive recipes" - what are some
ref/amplifier/README.md:330:the value of "metacognitive recipes"? Especially any that would demonstrate how such
… +6 lines
ref/amplifier/scenarios/transcribe/HOW_TO_CREATE_YOUR_OWN.md:177:- The recipe: Generate perspectives → Analyze independently → Detect conflicts → Debate → Synthesize
ref/amplifier/scenarios/transcribe/HOW_TO_CREATE_YOUR_OWN.md:182:- The recipe: Baseline → Spawn variants → Benchmark → Analyze winners → Combine best → Mutate

• Explored
└ Read HOW_TO_CREATE_YOUR_OWN.md, THIS_IS_THE_WAY.md
List ai_context
Read IMPLEMENTATION_PHILOSOPHY.md, DISCOVERIES.md, CLAUDE.md

• Ran rg -n "Knowledge" docs -g"*.md"
└ (no output)

• Explored
└ Search *.md in docs
Read KNOWLEDGE_WORKFLOW.md, MEMORY_SYSTEM.md, ROADMAP.md

─ Worked for 2m 13s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• Amplifier Overview

  • Amplifier wraps Claude Code in a full “agentic lab” with specialized subagents, scripted workflows, knowledge extraction, and memory; it treats the repo itself as both product and demo. ref/amplifier/README.md:16
  • Everything is designed around ruthless simplicity and modular “bricks”: short files, minimal abstractions, regenerate-on-demand philosophy. ref/amplifier/ai_context/IMPLEMENTATION_PHILOSOPHY.md:12
  • The project continuously captures learnings—DISCOVERIES, decision logs, knowledge graphs, transcripts, memories—so each run leaves the system smarter than before. ref/amplifier/DISCOVERIES.md:1
  • “Metacognitive recipes” (natural-language workflows that mix philosophy, tactics, and checks) drive tool generation and automation; users describe the thinking, Amplifier builds the tooling. ref/amplifier/scenarios/transcribe/
    HOW_TO_CREATE_YOUR_OWN.md:23

Patterns Worth Carrying into Skills OS

  • Skill scaffolding – Problem-solving and collaboration skills derived here become compact, symptom-driven playbooks (e.g., “When Stuck” dispatch). They translate agent behaviors into reusable procedures. sp/superpowers-skills/skills/problem-
    solving/when-stuck/SKILL.md:1
  • Agent catalog & philosophy – AGENTS.md and CLAUDE.md show how to encode mandatory behaviors (plan first, use subagents, parallelize) and load shared context automatically. This is a template for global Skills OS policies. ref/amplifier/
    CLAUDE.md:8
  • Knowledge synth pipeline – Makefile targets plus JSONL stores let agents mine documents, fingerprint concepts, detect contradictions, and surface query results. Lightweight, file-based, and self-updating—ideal for plugging into skills as
    evidence sources. ref/amplifier/docs/KNOWLEDGE_WORKFLOW.md:1
  • Memory + transcripts – Hooks save conversation transcripts pre-compaction and extract memories per session; retrieval happens at session start. Treat skills the same way: every run produces context, checkpoints, and reusable knowledge with
    zero manual effort. ref/amplifier/docs/MEMORY_SYSTEM.md:1
  • Decision/completion logs – DISCOVERIES.md, decision folders, and manifest logging patterns demonstrate how to capture root causes, mitigations, and prevention tips so future runs inherit the wisdom. ref/amplifier/DISCOVERIES.md:6
  • Resilience patterns – Incremental processing, partial-failure handling, defensive utilities for LLM results—all encoded as checklist items to prevent brittle tool behavior. ref/amplifier/DISCOVERIES.md:49
  • Parallel exploration – Worktree automation, modular builder, and orchestration guides embody “try variants concurrently” with clean isolation—mirrors our idea of spinning multiple subagents or skills per directory. ref/amplifier/docs/
    THIS_IS_THE_WAY.md:33

Skills OS Implications

  • Adopt Amplifier’s philosophy docs as the OS-level “kernel”: global rules (plan, parallelize, log decisions) become enforced skill policies.
  • Pair every skill with a scratch directory + manifest (Amplifier logging style) so runs produce artifacts, evidence, blockers, and follow-up requests automatically.
  • Build a knowledge subsystem that ingests skill outputs, transcripts, and DISCOVERIES-style notes; discovery can then match future tasks to proven recipes plus supporting evidence.
  • Reuse the metacognitive recipe idea: let users describe workflows in natural language, generate new skills or tool wrappers, then feed them back into the library.
  • Keep everything file-based: tools, skills, memory, knowledge, telemetry. It preserves YAGNI, enables git/versioning, and mirrors Amplifier’s “kernel/userland” roadmap.

Amplifier shows how to turn agent behaviors into disciplined, self-improving systems. Folding those ideas into a Skills OS—guided playbooks, shared state, knowledge compounding, and lightweight orchestration—gives us a blueprint that scales
beyond software tasks to any domain where procedures + tools + learning loops matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment