Right. Focus on injection mechanics and why it works, not what skills exist.
Not "here's all documentation upfront" - it's pull, not push:
Session starts → Minimal bootstrap injected
Agent encounters task → Agent searches for skill
Agent finds match → Agent reads SKILL.md
Agent uses skill → Follows instructions
Key: Context only loaded when needed. Main context stays clean.
Not documentation. Not guidelines. Executable process:
---
name: Thing To Do
when_to_use: Specific trigger conditions
---
# Overview
Core principle in 1-2 sentences
# Quick Reference
Pattern matching
# Implementation
Step-by-step checklist
# Supporting files
Tools/scripts if neededWhy it works:
- Frontmatter: Machine-parseable metadata
- Checklist: Unambiguous steps (→ TodoWrite)
when_to_use: Clear trigger conditions- Markdown: LLM-native format, easy to read/write
Skills tested against realistic failure modes:
Write skill → Test on subagent → Find ways subagent skips it
→ Strengthen language → Retest → Iterate until bulletproof
Uses persuasion principles not as hack but as reliability mechanism:
- Authority framing makes instructions stick
- Commitment patterns force acknowledgment
- Scarcity/pressure scenarios ensure compliance under stress
Result: Skills that survive adversarial testing actually work in practice.
Bootstrap establishes operating rules:
1. You have skills
2. Skills are discoverable (search tool)
3. If skill exists for task → YOU MUST USE IT
Not "helpful docs" - it's required procedure. Like aviation checklists.
Reinforced by:
- EXTREMELY_IMPORTANT framing
- Mandatory announcement ("I'm using X skill...")
- TodoWrite tracking for checklists
- Tested scenarios where agent tries to skip → fails test → skill strengthened
Two-phase discovery:
Phase 1: Search
find-skills [pattern] # Returns list with when_to_useAgent matches pattern to task.
Phase 2: Read & Execute
Read skill → Announce usage → Follow checklistWhy separation matters:
- Search is cheap (grep/metadata)
- Read is expensive (full file in context)
- Only read what's needed
Meta-skill that writes skills:
Agent learns pattern → Extracts to SKILL.md → Tests on subagents
→ If passes: Add to library → Future agents use it
Genius part: System gets smarter over time. Lessons codified, not lost.
Don't pollute main context with exploration:
Need to search? → Dispatch subagent with query
→ Subagent returns synthesis only
→ Main agent uses result
Principle: Keep main context for active work. Delegate research/search.
- RAG: Retrieves chunks, agent figures out what to do
- Skills: Retrieves instructions, agent executes
- System prompt: Fixed, everything upfront, context bloat
- Skills: Dynamic, loaded on-demand, scales indefinitely
- Docs: Reference material, agent interprets
- Skills: Executable procedures, agent follows
Skills are code for LLMs.
Not data. Not documentation. Executable procedures that:
- Load on-demand (JIT)
- Have clear triggers (
when_to_use) - Provide unambiguous steps (checklists)
- Are tested adversarially (TDD for docs)
- Use persuasion principles for reliability
- Self-improve (meta-skill writes skills)
The injection system:
- Minimal bootstrap (establish rules)
- Search tool (discovery)
- SKILL.md format (standard interface)
- Mandatory compliance (enforcement)
- Adversarial testing (quality gate)
Result: System that scales because context cost is O(1) per task, not O(n) for all knowledge.
• Read through docs/README.md, docs/claude/hooks.md, and docs/claude/hooks-guide.md, then dug into the superpowers marketplace (sp/superpowers-marketplace/README.md), the core plugin (sp/superpowers/README.md), and sampled skills (e.g., sp/
superpowers-skills/skills/testing/test-driven-development/SKILL.md). Here’s the picture I have now:
Plugins contribute their own hooks that merge automatically, using ${CLAUDE_PLUGIN_ROOT} to stay portable.
timeout-bound processes with strict security caveats—essentially deterministic guardrails surrounding Claude’s tool usage.
find-skills and skill-run.
JIT-injectable procedures. When a skill is invoked (manually or via hooks), Claude gets a concrete, enforceable playbook rather than relying on ad-hoc prompting.
So, the core idea is: hooks are the delivery mechanism, plugins (like superpowers) curate and register them, and skills are the portable, executable instruction sets that can be injected exactly when needed—very much a JIT guidance system for
the agent. Ready to dive into your Nexus (nx) implementation when you want.
• Skill System Blueprint
every step completed, tests run, evidence captured).
invocations should be automatic via matchers.
a weak skill is about to run.
again.
Happy to dive deeper into any axis—matching strategies, telemetry schema, or how to retrofit Nexus once you want.
• Why Skills Matter
Picture the agent as a pilot flying on instruments. When a new task appears—debugging a race condition, refactoring billing logic—we don’t want the pilot improvising every dial turn. We hand them a laminated checklist that’s proven to land
the plane safely. That laminated card is a skill: an executable, time-tested protocol the agent loads only when needed, then stows away. Skills keep reasoning sharp and predictable while avoiding the cost of stuffing every possible instruction
into the core prompt.
Core Building Blocks
flowchart TD
User[User or Test]
Intent[Intake & Classification]
Discovery[Skill Discovery]
Injection[Skill Injection]
Execution[Subagent + Tools]
Enforcement[Hooks & Telemetry]
Feedback[Learning & Updates]
Three Runtime Layers
items.
force a specific skill.
How Subagents Fit
Subagents behave like specialized copilots. Each skill invocation spawns or reuses a subagent configured with the skill’s instructions, constrained toolset, and fresh context. They execute the checklist end-to-end, report findings, then hand
control back. This keeps the main conversation clean and allows parallel work—documentation subagent, test-captain subagent, migration subagent—without context bloat.
Distribution & Governance
Plugins and marketplaces are the packaging and app store:
ASCII architecture sketch:
+-----------------------------+ +-------------------------+
| Claude Session | | Skill Telemetry Store |
| - Core prompt |<---->| - Run history |
| - Conversation context | | - Feedback/events |
| | +-----------^-------------+
| Hooks & Slash Commands | |
| (UserPrompt, Pre/Post, Stop)| (improves matching)
+--------------|--------------+ |
v |
+------+-------+ |
| Skill Dispatcher|----------------------+
| - Discovery |
| - Selection |
+------+-------+-+
| |
+---------+ +-------------+
v v
+------------+ +-------------+
| Skill Repo | | Plugin Repo |
| (checklists)| | (hooks/agents)
+------------+ +-------------+
| |
v v
+------------------+ +--------------------+
| Subagent Runtime | | Tool Permissions |
| - Skill checklist| | - Bash/Edit/etc. |
| - Evidence log | | - MCP integrations |
+------------------+ +--------------------+
End-to-End Flow
Learning & Evolution
Analogy That Sticks
Think DevOps runbooks meeting on-call dashboards. Classic runbooks exist as wiki pages; responders must remember where they live, hope they’re current, and then maybe follow them. Skill-driven agents turn each runbook into code: discoverable,
enforceable, auditable. Instead of trusting tribal knowledge or giant static prompts, we load only the one page needed, run it flawlessly, and learn from every execution.
Where To Experiment Next
With that architecture, we move from helpful heuristics to a disciplined operating system for reasoning agents—one where every critical workflow has a checklist, every checklist can evolve, and the agent can get smarter without swelling its
brain.
› ok one thing to consider, for examole matching skills and tools (skills are for example semantic data analysis / excel undertsanding..etc and tools are like specialized cli commands like fetching data, or converting pptx to pdf with
soffice..etc and tools are not skills but can be part of a skill but as tool is a tool can be independenty used depending on scenario, or custom small python scripts wherevr possible can be considered a makeshift tool). And we know that
subagent to main agent communication is also stateless. so when main agent asks a question subagent can try to do it, but may discover it cannot do it or needs more data/information other skills, as questions might reuqire multiple skills /
tools. So this creates a complexity. just considering one skill at a time probably is not going to be the most useful, but I'm a big favour of YAGNI or John Carmack approach. So if there is a way to simplify things, such as using disk as state
(simply writing to files..etc and aloowing other agents to read..etc) might be a way. Many messy thoughts, how do you approach this?
• I’ll create an accessible explanation advocating a layered approach starting simple and decoupling tools from skills, using filesystem-based state to coordinate conversation handoff and multi-skill orchestration. I plan to suggest primitives
like manifests and handshakes, sketch a phased skill composition flow from discovery to fallback, and emphasize incremental development to avoid overengineering. Including bullet sections and possibly ASCII diagrams will help illustrate the
aggregator concept and coordination mechanics.
─ Worked for 7s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• Shared Vocabulary
Keep It Simple, Layer by Layer
match those declarations without guessing.
layer—cheap, inspectable, versionable.
dumb but extensible.
yields a simple loop instead of complex multi-skill scheduling.
ASCII routing sketch:
User Request
|
v
Skill Discovery --(no match)--> Gap log
|
v
Skill Runner (subagent, stateless)
|
writes to run_logs//<run_id>/
|
+--------+---------------------------+
| | |
tools<-+ | next skill?
| v
| completion manifest (complete / needs / delegate)
+-------------------------------> Discovery / Main agent
Why This Works
From here you can add niceties—graph scoring for multi-skill plans, “skill bundles” that preload multiple procedures, richer feedback loops—but the core loop (discover → run → log → hand off) stays understandable and debuggable.
consumes produces are a bit tight no in your definition, not sure I'm happy with approach maybe good I'm not sure happy to try. Just think of a problem for example below.
many bits there are in this request, and doing this one might reuqire a lot things, maybe starting with one company and how it goes an scale ..with paralle subagents..but also excel web search web fetch..probably persisting to disk. So real
world questions are messy and might require not a very linear approach.
• I’m thinking about how to explain a flexible architecture for complex skill orchestration that handles messy tasks by using an orchestrator skill to manage stages like data ingestion and per-company pipelines, with a focus on capabilities,
artifacts, modular tools, and disk-based state management—all explained accessibly and supported by diagrams or ASCII.
─ Worked for 12s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• Here’s how I’d tackle a messy, multi-step brief like your company research + classification example while staying true to YAGNI and keeping the system understandable.
———
Think In Roles, Not Monolithic Skills
For this job you don’t want one giant skill that does everything. Break the work into four reusable skill archetypes:
A separate Classifier skill can run after synthesis to map each company onto your “Final_Industrial_Classification_Framework”.
Every skill still “consumes” and “produces”—but in practice you just tag the artifacts they expect and create. For example, the Synthesizer looks for notes/business-model.md, notes/products.json, etc. Think of them as capability tags rather
than rigid types.
———
Shared Disk = Memory Bus
Give each company its own directory (work/company_slug/) with this layout:
work/
acme-industries/
inputs/
brief.md
industrial_framework.xlsx
raw/
sources/
01_homepage.html
02_press_release.html
notes/
business-model.md
products.json
missing-data.md
outputs/
report.md
citations.csv
manifest.yaml # status log, open questions, next actions
This gives you persistence, audit trails, and makes parallelism possible without building a messaging system.
———
Orchestration Loop
At the top level, the main agent runs a tight loop:
for each company in input_excel:
ensure work/company/ exists
run skill: planner -> manifest.next_actions
while manifest.next_actions not empty:
action = pop_next_action()
dispatch to matching skill (retriever/extractor/...)
update manifest with results or follow-ups
run skill: synthesizer
run skill: classifier
aggregate all outputs into final deliverables
Skills communicate needs (e.g., “Need SEC filing after 2024-01-01”) by appending to manifest.yaml. The main loop keeps dispatching until nothing is pending.
———
Tool / Skill Matching
Tools stay modular: web-fetch, markitdown, extract_tables.py, classify_company.py, etc.
Each skill declares:
requires-tools:
- web_fetch
- markitdown
expects-artifacts:
- raw/sources/*
produces-artifacts:
- notes/business-model.md
This is just structured metadata; discovery uses it to confirm prerequisites and to pick the right skill when multiple options exist. Tools themselves can be invoked directly if necessary (e.g., manual quality check), but the default is that
skills wrap them in procedure and logging.
———
Handling “It Didn’t Work”
Because subagents are stateless, every skill run ends by updating the manifest:
status: blocked
reason: "No recent product information on company site."
request: "Run retriever with query 2024 product launch."
The main agent sees that, queues a retriever run with different parameters, and the next skill continues. You can also spawn a “feedback skill” (think meta-agent) that collects these blockers into a backlog for humans or long-term learning.
———
Scaling Up Gradually
———
Why This Feels Reasonable
Messy real-world requests become manageable: iterate per company, keep artifacts clean, enforce checklists inside skills, and let the main agent orchestrate with a very small, understandable set of primitives.
› So skills are not excel skill..etc? not like anthropic does?
• Right—when I say “skill” I don’t mean “Excel skill” or “Docx skill” in the Anthropics dump we looked at. Those are essentially long-form reference guides. I’m talking about small, executable playbooks: a focused procedure for a concrete
task (e.g., “gather company profile”, “extract product data from a page”) that can call whatever tools are appropriate—Excel helpers, web fetch scripts, classifiers—but isn’t tied to a single tool or format. The skill sits above the tools,
orchestrating when and how they’re used. Tools stay general-purpose; the skill combines them into a reusable workflow.
› woulnd't it be an overkill if the question is simply how is the weather in NY today?
• Asking “what’s the weather in New York today?” shouldn’t trigger a multi-stage research machine. Practical approaches:
So SNIP: we only pay the orchestration overhead when the problem actually needs it.
ok there are few things interesting superpowers like brainstorm planner executor..etc but that repo is more for software engineering tasks however we can borrow many ideas from there I believe and mix it with claude type skills to solve
problems for a real life. And network effect of tools and procedures/skills will make it compoundingly important ?
• Exactly. Superpowers gives us the scaffolding—brainstorm → plan → execute, tight checklists, clear hand-offs—that works beautifully for software work. If we generalize those patterns into domain-agnostic skill templates, then layer in broader
procedure libraries (research, ops, finance, etc.) we get:
So yes—the network effect kicks in once tools, procedures, and discovery feed each other. With disciplined encoding (short checklists, enforced verification) and shared state/logging, we can build a general-purpose “skills OS” for agents, not
just a dev helper.
› Summarize recent commits
• Yes—that’s the north star. “Skills OS” is the runtime that ties everything together:
Together they let an agent load the right procedure just-in-time, run it deterministically, call the right tools, capture evidence, and learn from every run. Tools make skills powerful; skills make tools usable; orchestration keeps it all
disciplined. That’s the operating system we’re aiming to build.
I'm laying out a detailed explanation of how to build a Skills OS by leveraging core features like composability and frontmatter metadata, proposing core components such as ingestion, metadata, runtime, and discovery, and highlighting how
these enable generalization to real-world tasks and skill templates with autopilot capabilities. I plan to organize the explanation with clear structure, practical references to example files with line numbers, and actionable next steps for
extending skill design into an OS framework.
─ Worked for 20s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• The problem-solving skills in superpowers already embody many traits we need for a Skills OS:
stuck/SKILL.md:18
Skills OS Blueprint
What we take from superpowers is the discipline: short, pointed procedures, symptom-driven discovery, and composability. By wrapping those patterns in shared state, discovery, and tool orchestration, we evolve them into a general-purpose
Skills OS that can serve messy real-world tasks as effectively as it serves software engineering.
Worked for 58s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• What the Superpowers Library Gives Us
(simplification cascades, collision-zone thinking), debugging (systematic debugging’s four phases), collaboration (brainstorming, dispatching-parallel-agents), testing (TDD), meta-maintenance (pulling-updates-from-skills-repository),
research (tracing-knowledge-lineages), and the mandatory “using-skills” discipline.
That’s the seed of a discovery/triage layer.
want our runtime to enforce via hooks/subagents.
and meta-skills keep the library healthy (pull updates, test skills with subagents). It’s a template for how procedures + tools can co-evolve.
Toward a Skills OS
With those ingredients we can outline the operating system we’ve been circling:
validation evidence appears).
Why Superpowers Is the Perfect Seed
Next Steps Toward Skills OS
By mining and systematizing what’s already in superpowers, we’re not just borrowing ideas for code work—we’re laying the foundation for a full Skills OS where any complex workflow (software, business ops, research) becomes a deterministic,
discoverable, improvable playbook executed by an agent that actually follows the rules.
› ok and superpower also refer to mmicrosofts amplifier repo which is in ref/amplifier directory. Please do explore that library throughly and do a very good udnertsanding and see if there are bits of information that might be helpful to us
• Explored
└ List amplifier
Read README.md, AMPLIFIER_VISION.md, AGENTS.md
• Ran rg -n "recipe" -g"*.md" ref/amplifier | head
└ ref/amplifier/README.md:328:/ultrathink-task I'm new to the concepts of "metacognitive recipes" - what are some
ref/amplifier/README.md:330:the value of "metacognitive recipes"? Especially any that would demonstrate how such
… +6 lines
ref/amplifier/scenarios/transcribe/HOW_TO_CREATE_YOUR_OWN.md:177:- The recipe: Generate perspectives → Analyze independently → Detect conflicts → Debate → Synthesize
ref/amplifier/scenarios/transcribe/HOW_TO_CREATE_YOUR_OWN.md:182:- The recipe: Baseline → Spawn variants → Benchmark → Analyze winners → Combine best → Mutate
• Explored
└ Read HOW_TO_CREATE_YOUR_OWN.md, THIS_IS_THE_WAY.md
List ai_context
Read IMPLEMENTATION_PHILOSOPHY.md, DISCOVERIES.md, CLAUDE.md
• Ran rg -n "Knowledge" docs -g"*.md"
└ (no output)
• Explored
└ Search *.md in docs
Read KNOWLEDGE_WORKFLOW.md, MEMORY_SYSTEM.md, ROADMAP.md
─ Worked for 2m 13s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• Amplifier Overview
HOW_TO_CREATE_YOUR_OWN.md:23
Patterns Worth Carrying into Skills OS
solving/when-stuck/SKILL.md:1
CLAUDE.md:8
evidence sources. ref/amplifier/docs/KNOWLEDGE_WORKFLOW.md:1
zero manual effort. ref/amplifier/docs/MEMORY_SYSTEM.md:1
THIS_IS_THE_WAY.md:33
Skills OS Implications
Amplifier shows how to turn agent behaviors into disciplined, self-improving systems. Folding those ideas into a Skills OS—guided playbooks, shared state, knowledge compounding, and lightweight orchestration—gives us a blueprint that scales
beyond software tasks to any domain where procedures + tools + learning loops matter.