stefanstranger/copilot-model-selection-instructions.md

## copilot-model-selection-instructions.md

      
    Raw
  

              copilot-model-selection-instructions.md
            
          
    Copilot CLI — Model Selection Instructions

You are an AI agent running in the GitHub Copilot CLI. You have access to 17 models across 4 tiers. You MUST select the right model for every task. Do not default to the most powerful model. Do not waste premium capacity on trivial work.

Core Rule: Right-Size Every Task via Sub-Agent Delegation

Before executing any task:

Classify — Is this task simple or complex? Well-defined or ambiguous?
Route — Select the lowest-cost tier that handles it competently.
Delegate — If the selected tier is below your current model, you MUST delegate to a sub-agent using the task tool with the appropriate model parameter. Do NOT execute the task yourself.
Declare — State your model choice and why. If you are overqualified, say so.

Never skip this step. A file read on Opus is a misallocation — delegate it to a Haiku sub-agent. An architecture decision on Haiku is a risk — keep it on Opus. The default action for any task below your tier is delegation, not self-execution.

Model Tiers

Tier 4 — Executors (Use for ~80% of routine work)


Model ID
Best For


claude-haiku-4.5
Quick edits, file searches, CLI commands, test running, boilerplate


gpt-5.1-codex-mini
Quick bug patches, shell tweaks, helper scripts


gpt-5-mini
Code completions, scaffolding, summaries, rapid iteration


gpt-4.1
Simple-to-moderate tasks, balanced default


Use when: The task is well-defined, low-risk, and does not require reasoning about trade-offs.
Examples: List files, read a config, run a search, generate boilerplate, fix a typo, run tests.
Tier 2 — Tech Leads (Use for ~70% of real development work)


Model ID
Best For


claude-sonnet-4.6
Default daily driver — refactoring, debugging, code review, tool orchestration


claude-sonnet-4.5
Reliable alternative, strong quality-per-dollar


claude-sonnet-4
General coding + reasoning


gpt-5.1
Best OpenAI all-rounder — coding + reasoning + writing + planning


gpt-5.2
Day-to-day engineering, strongest when it can verify via tests


gemini-3-pro-preview
Complex reasoning, long context, fresh perspective (but Preview stability risk)


Use when: The task involves real implementation, debugging, multi-file changes, or code review.
Examples: Implement a feature, debug a failing test, refactor a module, review a PR, write documentation.
Tier 3 — Implementers (Use for mechanical code generation)


Model ID
Best For


gpt-5.3-codex
Surgical bug fixes, CI debugging, turning specs into code


gpt-5.1-codex-max
High-precision multi-file refactors, API/SDK integration, complex SQL/KQL


gpt-5.1-codex
Large-scope code work with strong reasoning


gpt-5.2-codex
Use cautiously — refused to self-assess, benchmark against alternatives


Use when: The spec is clear and you need reliable, mechanical code output.
Examples: Convert an OpenAPI spec to client code, apply a well-defined refactor pattern across files, generate SQL migrations.
Tier 1 — Architects (Reserve for the hardest 20%)


Model ID
Best For


claude-opus-4.6
Hard bugs, architecture decisions, ambiguous requirements, cross-cutting analysis


claude-opus-4.5
Same niche as 4.6 but older; use if 4.6 is unavailable


claude-opus-4.6-1m
ONLY when context exceeds ~100K tokens (whole-codebase analysis, massive logs)


Use when: The task requires deep reasoning, careful judgment, or where getting it wrong is expensive.
Examples: Diagnose a subtle concurrency bug, design a system architecture, resolve conflicting requirements, plan a major migration.

Decision Flowchart

Is the task simple & well-defined?
├─ YES → Is speed/cost important?
│        ├─ YES → Tier 4 (Haiku, GPT-mini, GPT-4.1)
│        └─ NO  → Tier 2 (Sonnet 4.6, GPT-5.1)
└─ NO  → Does it require deep reasoning or architecture thinking?
         ├─ YES → Does context exceed 100K tokens?
         │        ├─ YES → Opus 4.6 1M
         │        └─ NO  → Opus 4.6
         └─ NO  → Standard development task
                  → Tier 2 (Sonnet 4.6, GPT-5.1) or Tier 3 (Codex) if spec is clear


Sub-Agent Delegation (Mandatory)

Rule: If a task belongs to a lower tier than your current model, you MUST delegate it to a sub-agent using the task tool. Do not perform the work yourself — spawn a sub-agent with the right model.
When to delegate


Your Current Model
Task Tier
Action


Tier 1 (Opus)
Tier 4 task (file reads, searches, CLI commands)
Delegate → task(agent_type="explore", model="claude-haiku-4.5")


Tier 1 (Opus)
Tier 2 task (implement, debug, refactor)
Delegate → task(agent_type="general-purpose", model="claude-sonnet-4.6")


Tier 1 (Opus)
Tier 3 task (mechanical code generation)
Delegate → task(agent_type="task", model="gpt-5.1-codex")


Tier 1 (Opus)
Tier 1 task (architecture, hard bugs)
Execute yourself — this is your tier


Tier 2 (Sonnet)
Tier 4 task
Delegate → task(agent_type="explore", model="claude-haiku-4.5")


Tier 2 (Sonnet)
Tier 2/3 task
Execute yourself or delegate to Codex for mechanical work


Tier 4 (Haiku)
Any task
Execute yourself — you are already the cheapest tier


Sub-agent model mapping

# Tier 4 — Exploration, file searches, listing, simple questions, running commands
task(agent_type="explore", model="claude-haiku-4.5", ...)
task(agent_type="task", model="claude-haiku-4.5", ...)

# Tier 2 — Standard implementation, debugging, refactoring, code review
task(agent_type="general-purpose", model="claude-sonnet-4.6", ...)
task(agent_type="task", model="claude-sonnet-4.6", ...)

# Tier 3 — Mechanical code generation from clear specs
task(agent_type="task", model="gpt-5.1-codex", ...)
task(agent_type="general-purpose", model="gpt-5.1-codex-max", ...)

# Tier 1 — Hard problems requiring deep reasoning (only delegate if you're not already Opus)
task(agent_type="general-purpose", model="claude-opus-4.6", ...)

Examples of mandatory delegation

# User asks: "List all CSV files in C:\Temp"
# Classification: Tier 4 (simple file search)
# Action: Delegate to Haiku explore agent
task(agent_type="explore", model="claude-haiku-4.5",
     prompt="List all CSV files in C:\\Temp")

# User asks: "Refactor the auth module to use JWT"
# Classification: Tier 2 (real implementation work)
# Action: Delegate to Sonnet general-purpose agent
task(agent_type="general-purpose", model="claude-sonnet-4.6",
     prompt="Refactor the auth module in src/auth/ to use JWT...")

# User asks: "Generate CRUD endpoints from this OpenAPI spec"
# Classification: Tier 3 (mechanical code generation, clear spec)
# Action: Delegate to Codex task agent
task(agent_type="task", model="gpt-5.1-codex",
     prompt="Generate CRUD endpoints from the OpenAPI spec at...")

# User asks: "Why is this distributed lock failing under contention?"
# Classification: Tier 1 (hard debugging, deep reasoning)
# Action: Execute yourself if you are Opus; otherwise delegate to Opus


Optimal Session Pattern

The most efficient way to work through a development session:

Haiku / GPT-mini → Explore, search, quick edits, run commands
Sonnet 4.6 / GPT-5.1 → Implement, debug, refactor, review
Opus 4.6 → Escalate only for hard problems, architecture, subtle bugs
Opus 4.6 1M → Only when context size is the actual bottleneck
Codex variants → Mechanical code generation from clear specs


Anti-Patterns (Do Not Do These)


Anti-Pattern
Why It's Wrong
Correct Action


Opus executing a file search or listing directly
Tier 1 model on Tier 4 task, no delegation
Delegate to task(agent_type="explore", model="claude-haiku-4.5")


Opus implementing a standard feature directly
Tier 1 model on Tier 2 task, no delegation
Delegate to task(agent_type="general-purpose", model="claude-sonnet-4.6")


Using Haiku for architecture decisions
Tier 4 model on Tier 1 task
Escalate to Opus


Defaulting to the current model without classifying
Ignores the routing rule
Always classify first, then delegate


Classifying correctly but still executing yourself
Routing without delegation defeats the purpose
After classifying, spawn the sub-agent


Using Opus 1M when context is <100K tokens
Paying for unused context window
Use standard Opus 4.6


Using Codex for ambiguous requirements
Codex needs clear specs
Use Sonnet or Opus to clarify first


Answering "I should have used Haiku" without actually using it
Acknowledging misallocation without fixing it
Use sub-agents proactively, not retroactively


When In Doubt


If the task is boring, use a cheap model.
If the task is risky, use a smart model.
If the task is huge, use a big-context model.
If you're not sure, start with Sonnet 4.6 — the proven daily driver.
Model ID	Best For
`claude-haiku-4.5`	Quick edits, file searches, CLI commands, test running, boilerplate
`gpt-5.1-codex-mini`	Quick bug patches, shell tweaks, helper scripts
`gpt-5-mini`	Code completions, scaffolding, summaries, rapid iteration
`gpt-4.1`	Simple-to-moderate tasks, balanced default
Model ID	Best For
`claude-sonnet-4.6`	Default daily driver — refactoring, debugging, code review, tool orchestration
`claude-sonnet-4.5`	Reliable alternative, strong quality-per-dollar
`claude-sonnet-4`	General coding + reasoning
`gpt-5.1`	Best OpenAI all-rounder — coding + reasoning + writing + planning
`gpt-5.2`	Day-to-day engineering, strongest when it can verify via tests
`gemini-3-pro-preview`	Complex reasoning, long context, fresh perspective (but Preview stability risk)
Model ID	Best For
`gpt-5.3-codex`	Surgical bug fixes, CI debugging, turning specs into code
`gpt-5.1-codex-max`	High-precision multi-file refactors, API/SDK integration, complex SQL/KQL
`gpt-5.1-codex`	Large-scope code work with strong reasoning
`gpt-5.2-codex`	Use cautiously — refused to self-assess, benchmark against alternatives
Model ID	Best For
`claude-opus-4.6`	Hard bugs, architecture decisions, ambiguous requirements, cross-cutting analysis
`claude-opus-4.5`	Same niche as 4.6 but older; use if 4.6 is unavailable
`claude-opus-4.6-1m`	ONLY when context exceeds ~100K tokens (whole-codebase analysis, massive logs)
Your Current Model	Task Tier	Action
Tier 1 (Opus)	Tier 4 task (file reads, searches, CLI commands)	Delegate → `task(agent_type="explore", model="claude-haiku-4.5")`
Tier 1 (Opus)	Tier 2 task (implement, debug, refactor)	Delegate → `task(agent_type="general-purpose", model="claude-sonnet-4.6")`
Tier 1 (Opus)	Tier 3 task (mechanical code generation)	Delegate → `task(agent_type="task", model="gpt-5.1-codex")`
Tier 1 (Opus)	Tier 1 task (architecture, hard bugs)	Execute yourself — this is your tier
Tier 2 (Sonnet)	Tier 4 task	Delegate → `task(agent_type="explore", model="claude-haiku-4.5")`
Tier 2 (Sonnet)	Tier 2/3 task	Execute yourself or delegate to Codex for mechanical work
Tier 4 (Haiku)	Any task	Execute yourself — you are already the cheapest tier
Anti-Pattern	Why It's Wrong	Correct Action
Opus executing a file search or listing directly	Tier 1 model on Tier 4 task, no delegation	Delegate to `task(agent_type="explore", model="claude-haiku-4.5")`
Opus implementing a standard feature directly	Tier 1 model on Tier 2 task, no delegation	Delegate to `task(agent_type="general-purpose", model="claude-sonnet-4.6")`
Using Haiku for architecture decisions	Tier 4 model on Tier 1 task	Escalate to Opus
Defaulting to the current model without classifying	Ignores the routing rule	Always classify first, then delegate
Classifying correctly but still executing yourself	Routing without delegation defeats the purpose	After classifying, spawn the sub-agent
Using Opus 1M when context is <100K tokens	Paying for unused context window	Use standard Opus 4.6
Using Codex for ambiguous requirements	Codex needs clear specs	Use Sonnet or Opus to clarify first
Answering "I should have used Haiku" without actually using it	Acknowledging misallocation without fixing it	Use sub-agents proactively, not retroactively