oneryalcin/agent_instructions_writing_guide.md

## agent_instructions_writing_guide.md

      
    Raw
  

              agent_instructions_writing_guide.md
            
          
    Agent Instruction Writing Guide

Purpose: How to write effective, maintainable instructions for autonomous AI agents.
Audience: Engineers and product teams creating agent-based systems.

Core Principle

Agent instructions are NOT user documentation.

User docs explain WHY and provide context for understanding
Agent instructions specify WHAT to do and WHEN to do it
Users learn incrementally; agents execute deterministically

Golden Rule: Every sentence must inform a decision or specify an action. Eliminate everything else.

Architectural Patterns That Work

1. Just-In-Time Knowledge Injection

Pattern:
## Step 1: Analyze Inputs & Choose Path
Based on inputs, select path: A, B, C, or D

## Step 2: Load Relevant Guides (MANDATORY)
Path A: Read guide-1.md, guide-2.md
Path B: Read guide-1.md, guide-3.md
Path C: Read guide-2.md, guide-4.md
Path D: Read guide-3.md, guide-4.md
Why: Agent loads only needed knowledge when needed. Massive token savings, reduced confusion.
Anti-pattern:
Here's everything you might need (loads all 4 guides regardless of path)

2. External Audit Trail

Pattern:
## Step 0: Create Work Log (MANDATORY - DO THIS FIRST)
Copy template from /path/to/template.md to work_log_[PROJECT].md

**Update after EVERY step:**
- Step 1: Document inputs detected, path chosen
- Step 2: Document guides loaded (with line counts)
- Step 3: Document decisions made
Why:

Human can audit agent's reasoning
Agent can't skip steps
Debugging becomes traceable

Anti-pattern: "Document your decisions" (where? how? optional?)

3. Gate Checks at Critical Points

Pattern:
## Step 4: Dimension Planning (MANDATORY)
Calculate dimensions for ALL slides. Target 470-480px height.

**GATE CHECK:** Before writing ANY HTML, verify:
- [ ] Calculations documented
- [ ] All totals ≤480px
- [ ] work_log updated

**If dimension planning incomplete, STOP. Do not proceed to Step 5.**
Why: Prevents cascading errors, forces deliberation before irreversible actions.
Anti-pattern: "Make sure you plan dimensions before coding" (agent proceeds anyway)

4. Path-Based Routing with Explicit Matrix

Pattern:
| Inputs Available | Path | Strategy |
|------------------|------|----------|
| Content + Template + Reference | A | Template structure + Reference styling |
| Content + Reference | B | Infer structure + Reference styling |
| Content + Template | C | Template structure + Inferred styling |
| Content only | D | Infer both structure and styling |

Document in work_log: Path selected + Reasoning
Why: Zero ambiguity about which workflow to follow.
Anti-pattern: "If you have a reference, use it. If you have a template, follow it." (what if both?)

5. Modular Knowledge Base (One Concern Per Guide)

Pattern:
guides/
  ├── dimension-planning.md      # ONE concern: height calculations
  ├── technical-requirements.md  # ONE concern: HTML specs
  ├── style-extraction.md        # ONE concern: getting colors/fonts
  └── pattern-library.md         # ONE concern: layout structures

Why: Agent reads only relevant module, each module focused and maintainable.
Anti-pattern: comprehensive-guide.md (5000 lines, agent drowns in irrelevant content)

6. Single Source of Truth + References

Pattern:
# agent-instructions.md
**Charts (MANDATORY):**
- MUST use JSON data islands
- See technical-requirements.md "Chart Integration" section for complete spec

# technical-requirements.md
## Chart Integration with Plotly.js
[... complete specification here ...]
Why: Fact lives once, updated once, referenced everywhere.
Anti-pattern: Chart spec repeated in agent instructions + technical guide (diverges over time)

Writing Techniques

Technique 1: Imperative Commands, Not Explanations

✅ DO:
Use flex-1 for dynamic content (NOT hardcoded heights).
❌ DON'T:
**Why flex-1?**
- Responsive: Automatically distributes available space
- Safe: Prevents overflow (respects parent container)
- Maintainable: No magic numbers to update
Rule: If it doesn't inform a decision, delete it.

Technique 2: Examples Over Prose

✅ DO:
❌ BAD:
<div style="height: 458px;">...</div>

✅ GOOD:
<div class="flex-1">...</div>
❌ DON'T:
Hardcoded heights are problematic because if upstream elements change size, the fixed height container won't adapt, leading to overflow or wasted space. Instead, use flex-1 which dynamically fills available space.
Rule: Show, don't tell. Code examples communicate faster than essays.

Technique 3: Checklists as Verification Gates

✅ DO:
## Step 4 Complete When:
- [ ] Title height calculated: [X]px
- [ ] Content area calculated: [Y]px
- [ ] Total ≤480px verified
- [ ] work_log updated with calculations
❌ DON'T:
Make sure you've calculated dimensions properly before proceeding.
Rule: Verifiable conditions, not vague requests.

Technique 4: Emphasis Hierarchy

✅ DO:
**MANDATORY:** Create work_log (blocking requirement)
**CRITICAL:** Read guides with no limits (essential for quality)
**IMPORTANT:** Use brand colors (significantly improves output)
❌ DON'T:
Important: Do all these things (which is most important?)
Rule: Three-level hierarchy: MANDATORY (must) > CRITICAL (should) > IMPORTANT (best practice)

Technique 5: Naming Conventions as Programmatic Gates

✅ DO:
**MANDATORY NAMING:** All output files MUST end with `_presentation.html`
- Frontend identifies canvas-editable files by this suffix
- Format: `{descriptive_name}_presentation.html`
❌ DON'T:
Name files appropriately so the frontend can find them.
Rule: Enforce conventions programmatically, not through hope.

Technique 6: Separate Conceptual from Absolute Constraints

✅ DO:
**Template JSON provides CONCEPTUAL guidance, not pixel-perfect dimensions.**
- ✅ Use for: Slide count, layout types, content relationships
- ❌ Don't use for: Exact pixel positions (JSON may reference 1280×720 but ppt-canvas is 992×558px)

**Use RELATIVE relationships:**
- 4 elements at similar sizes → Create equal grid
- 2 elements side-by-side → Create 50/50 split
❌ DON'T:
Follow the template dimensions but also respect ppt-canvas constraints.
Rule: When multiple specs exist, explicitly state which is authoritative for what.

Technique 7: Progressive Refinement (Sequential Phases)

✅ DO:
Workflow:
Step 1 → Analyze inputs
Step 2 → Load guides (depends on Step 1 path)
Step 3 → Plan structure (depends on Step 2 knowledge)
Step 4 → Calculate dimensions (depends on Step 3 plan)
Step 5 → Write HTML (depends on Step 4 calculations)
Step 6 → Verify (depends on Step 5 output)
❌ DON'T:
Steps:
1. Analyze inputs
2. Plan structure
3. Write HTML
4. Load guides as needed
(No dependencies, agent skips ahead)
Rule: Each step builds on previous, dependencies explicit.

Technique 8: Treat Context as Finite Resource (Right Altitude)

✅ DO:
## System Prompt Guidelines

**Right altitude:** Specific enough to guide behavior, flexible enough to allow model reasoning.

❌ TOO LOW (brittle):
If user asks for chart AND data has >10 rows AND file size >1MB, use pagination with 50 rows per page, UNLESS...

✅ RIGHT ALTITUDE:
For large datasets, break into manageable chunks. Consider pagination for tables with 50+ rows.

❌ TOO HIGH (vague):
Handle data appropriately based on context.
❌ DON'T:
Hardcode complex branching logic or provide vague high-level guidance.
Rule: LLMs have finite attention budgets. Every token depletes focus. Find the Goldilocks zone between over-specification and under-specification.

Technique 9: Heuristics Over Rigid Rules

✅ DO:
## Effort Scaling

**Simple tasks:** Quick validation (grep for patterns, check file exists)
**Medium tasks:** Moderate exploration (read 3-5 files, test one approach)
**Complex tasks:** Deep analysis (read 10+ files, try multiple approaches, document trade-offs)

Let task complexity guide thoroughness.
❌ DON'T:
ALWAYS read every file in the directory.
ALWAYS try 5 different approaches.
ALWAYS document every decision in 200+ words.
Rule: Proportional effort. Simple tasks shouldn't require enterprise-level rigor.

Technique 10: Agent Self-Improvement Loop

For standalone agents:
## Meta-Instruction (Optional)

After completing 5 tasks, review failures and suggest instruction improvements:
1. Which steps caused confusion?
2. Which examples would have helped?
3. Which rules were too rigid?

Document suggestions in instruction_improvements.md.
For orchestrated subagents (conditional trigger):
## Workflow Learnings (Debug Mode - Optional)

**ONLY IF main agent explicitly requests workflow feedback:**

Document in workflow_learnings_[TASK_NAME].md if you discovered:
- Guide instructions unclear or conflicting
- Step sequence inefficient for this use case
- Missing guidance for specific scenario

**Template:**
### [Date] - [Title]
**Context:** [What you were doing]
**Issue:** [What instruction was unclear]
**What you did:** [How you resolved it]
**Suggestion:** [How to improve instructions]

**Note:** This adds latency. Main agent enables for debugging, not routine operation.
❌ DON'T:
Assume instructions are perfect. Production use reveals failure modes.
Rule: Let agents improve their own instructions. Anthropic saw 40% improvement when Claude edited prompt engineering guides. For orchestrated subagents, make reflection conditional to control latency cost.

Technique 11: Start Small on Evals

✅ DO:
## Evaluation Strategy

1. Create 20 representative test cases (mix of common + edge cases)
2. Run agent, measure success rate
3. Fix failures, iterate
4. Expand to 50 cases once stable
5. Add new cases as production reveals gaps
❌ DON'T:
Create 100 test cases before first run.
Wait for "perfect" coverage before deploying.
Rule: 20 diverse cases > 100 similar cases. Start small, iterate based on real failures.

Technique 12: Extended Thinking for Debugging

✅ DO:
## Debugging Mode (When Needed)

For complex failures, enable extended thinking:
- Agent shows reasoning before acting
- Exposes decision logic (e.g., "File A vs File B: A has more recent timestamp")
- Useful for diagnosing why agent chose wrong path

**When to use:** Repeated failures, unclear decision points
**When not to use:** Simple tasks, well-understood workflows (adds latency)
❌ DON'T:
Always run in extended thinking mode.
(Slows every task, including trivial ones)
Rule: Extended thinking is a diagnostic tool. Use when agent behavior is opaque, disable when clear.

Technique 13: External Verification Tools

✅ DO:
## Step 6: Verification

Run verification script:
```bash
bash /path/to/verify.sh output.html
Script checks:

✅ Required wrappers present
✅ No forbidden patterns
✅ Content within limits


**❌ DON'T:**
```markdown
Check that your output follows all the rules.

Rule: Programmatic verification > manual checklist where possible.

Common Anti-Patterns

Anti-Pattern 1: Explanatory Fluff Masquerading as Instructions

❌ WRONG:
**Rationale:**
- Try exact fonts first (corporate environments often have them)
- Fall back to web alternatives that match design character
- Zero licensing issues, graceful degradation
Why wrong: Legal/performance justifications don't inform agent actions.
✅ FIX:
CSS font stacks: `'Exact Font', 'Web Alternative', generic-family`

Anti-Pattern 2: Non-Generalizable Examples

❌ WRONG:
Example output (City Design PPTX): Fonts: headline=Felix Titling...
**Example (City Design PPTX):**
... (mentioned 6 times)
Why wrong: Agent thinks "City Design PPTX" is THE standard, not AN example.
✅ FIX:
Example output: Fonts: headline=[YOUR_FONT]...
**Example with placeholders:**

Anti-Pattern 3: Redundant Documentation Across Files

❌ WRONG:
# agent.md
Charts use JSON data islands:
<script type="application/json">...</script>
[... 120 lines of examples ...]

# technical-requirements.md
Charts use JSON data islands:
<script type="application/json">...</script>
[... 120 lines of examples ...]
Why wrong: Specs diverge, maintenance nightmare.
✅ FIX:
# agent.md
**Charts:** See technical-requirements.md "Chart Integration" section

# technical-requirements.md
[... canonical spec here ...]

Anti-Pattern 4: Verbose ASCII Art for Simple Concepts

❌ WRONG:
**A. Centered (Minimal)**
┌─────────────────────────┐
│                         │
│    [Large Title]        │
│    [Subtitle/Date]      │
│    [Company Logo]       │
│                         │
└─────────────────────────┘
(15 lines to show centered layout)

Why wrong: Takes 15 lines to communicate "centered."
✅ FIX:
**Centered:** Title, subtitle, logo vertically centered

Anti-Pattern 5: Ambiguous Targets Without Clarification

❌ WRONG:
Target maximum: 480px
Content area: 498px
Why wrong: Which is the target? Are both valid?
✅ FIX:
Target maximum: 470px for high-density content (28px safety margin)
Target maximum: 480px for standard content (18px safety margin)
Absolute limit: 498px (will overflow if exceeded)

**High-density:** Tables 10+ rows, lists 8+ items
**Standard:** Charts, simple layouts

Anti-Pattern 6: Optional Critical Steps

❌ WRONG:
Step 4: Dimension Planning
Calculate dimensions to ensure content fits.
Why wrong: Agent skips, causes overflow errors.
✅ FIX:
## Step 4: Dimension Planning (MANDATORY)
**GATE CHECK:** Before writing HTML, calculate dimensions.
**If dimension planning incomplete, STOP. Do not proceed.**

Real-World Example: Before & After

BEFORE (Verbose, Ambiguous)

# Presentation Agent

You're a helpful agent that creates presentations. You should read the content file and make slides. If there's a template, use it. If there's a reference PPTX, try to match the style. Make sure the slides look good and don't overflow. Use good colors and fonts that match the brand.

When you create charts, use Plotly. Try to make them look nice. Tables should be formatted well. Remember to check your work before finishing.
Problems:

No workflow structure
"If there's" = ambiguous routing
"Make sure" = no verification mechanism
"Try to" = optional, agent may skip
"Look good" = subjective
No audit trail
No gate checks


AFTER (Structured, Deterministic)

# Presentation Agent

## Step 0: Create Work Log (MANDATORY)
Copy `/tmp/guides/work_log_TEMPLATE.md` to `work_log_[PROJECT].md`
Update after EVERY step.

## Step 1: Analyze Inputs & Choose Path

| Content | Template | Reference | Path |
|---------|----------|-----------|------|
| ✓ | ✓ | ✓ | A |
| ✓ | ✗ | ✓ | B |
| ✓ | ✓ | ✗ | C |
| ✓ | ✗ | ✗ | D |

Document in work_log: Path + inputs detected

## Step 2: Load Guides (CRITICAL)
Path A: Read pptx-style-transfer.md, template-interpretation.md
Path B: Read pptx-style-transfer.md, slide-patterns.md
Path C: Read brand-style-matrix.md, template-interpretation.md
Path D: Read brand-style-matrix.md, slide-patterns.md

Read FULL files, no limits.

## Step 3: Plan Structure & Styling
Extract/infer:
- Slide count, layout types
- Colors (exact hex from reference OR inferred from brand)
- Fonts (CSS stacks with fallbacks)
Document in work_log.

## Step 4: Dimension Planning (MANDATORY)
Read `/tmp/guides/dimension-planning.md` (NO limits)
Calculate heights for ALL slides. Target ≤480px.

**GATE CHECK:**
- [ ] Calculations documented
- [ ] All totals ≤480px
- [ ] work_log updated

**If incomplete, STOP. Do not proceed.**

## Step 5: HTML Creation
Read `/tmp/guides/technical-requirements.md` for specs.

**MANDATORY:** Files MUST end with `_presentation.html`
**Charts:** MUST use JSON data islands (see technical-requirements.md)

Document in work_log: Filename, slide count, styling applied.

## Step 6: Verification
Run: `bash /tmp/guides/verify.sh output.html`
Read `/tmp/guides/layout-verification.md`

**Only mark complete when ALL checks pass.**
Improvements:

Sequential workflow with dependencies
Path-based routing (no ambiguity)
External audit trail (work_log)
Gate checks (blocks progression)
Specific file references (no guessing)
MANDATORY emphasis (non-negotiable)
Verifiable completion (programmatic checks)


Agent Instruction Audit Checklist

Before finalizing agent instructions, verify:
Structure:

 Sequential workflow with clear steps?
 Dependencies explicit (Step N requires Step N-1)?
 Path-based routing if multiple scenarios?
 External audit trail required?

Clarity:

 Every sentence informs a decision or specifies action?
 No explanatory fluff (removed all "Why this works" sections)?
 Examples used instead of prose explanations?
 Emphasis hierarchy consistent (MANDATORY > CRITICAL > IMPORTANT)?
 Right altitude achieved (not too rigid, not too vague)?
 Effort scales to task complexity (heuristics over rigid rules)?

Completeness:

 Gate checks at critical points?
 Checklists for verification?
 Single source of truth for each fact?
 Naming conventions enforced programmatically?

Maintenance:

 Modular knowledge base (one concern per guide)?
 No redundant documentation across files?
 No non-generalizable examples (specific company names 6 times)?
 Specs live in canonical location, referenced elsewhere?

Testability:

 External verification scripts where possible?
 Verifiable completion conditions?
 Human can audit agent's decisions via work_log?
 Eval strategy defined (start with 20 representative cases)?
 Self-improvement mechanism considered (standalone: after N tasks / orchestrated: conditional debug mode)?
 Extended thinking mode documented for debugging (when to use/not use)?


Summary: The Recipe


Break task into sequential phases (Step 0 → Step N)
Require external audit trail (work_log updated every step)
Use path-based routing for multiple scenarios
Load knowledge just-in-time (Step 2 based on Step 1 path)
Insert gate checks before irreversible actions
Modularize knowledge base (one guide per concern)
Eliminate all fluff (imperative commands only)
Show examples, not explanations (code > prose)
Add verification checklists (human and programmatic)
Enforce single source of truth (canonical spec, referenced elsewhere)
Find right altitude (specific enough to guide, flexible enough for model reasoning)
Scale effort to complexity (simple tasks get simple treatment)
Enable self-improvement (standalone: after N tasks; orchestrated subagents: conditional debug mode)
Start with 20 test cases (iterate based on real failures, not hypothetical perfection)
Use extended thinking for debugging (diagnostic tool, not default mode)

Result: Agent executes deterministically, human can audit, system is maintainable.

Guide complete.
No results found