swapp1990/discuss-1773125666915-gist-1773170622.md

## discuss-1773125666915-gist-1773170622.md

      
    Raw
  

              discuss-1773125666915-gist-1773170622.md
            
          
    Here's my analysis of whether the revision agent is over-engineered.
THE FULL PIPELINE (for one revision):
1. plan_revision_instruction()     — 2 LLM calls
   ├─ _llm_plan()                  — 1 call: which surfaces to touch
   └─ _build_surface_notes()       — 1 call per surface (usually 1-2)

2. apply_revision_plan() per surface:
   ├─ _edit_title()                — 1 call (if needed)
   ├─ _edit_synopsis()             — 1 call (if needed)
   ├─ _edit_chapter_outlines()     — 1 call (if needed)
   └─ _revise_chapter_paragraphs() — THE BIG ONE:
       ├─ plan_beat_actions()      — 1 call: decide KEEP/EDIT/REMOVE/NEW per beat
       ├─ phase1_map_paragraphs()  — 0-1 calls: map paragraphs to beats
       └─ phase2_execute_actions() — 1 call per non-KEEP beat

3. evaluate_revision()             — 1 call
4. Retry loop (up to 2x)          — repeats step 2-3

Typical revision: 5-8 LLM calls. A 4-beat chapter where 2 beats change = plan(1) + notes(1) + beat_actions(1) + phase1(0, stored) + phase2(2 EDIT + 2 KEEP continuity) + eval(1) = ~8 calls.
THE CORE QUESTION: Is the 3-phase paragraph approach (plan_beat_actions → phase1 → phase2) too complex?
My assessment: it IS complex, but justified. Here's why each piece exists:
Why not just send the whole chapter + instruction to the LLM?
The naive approach — "here's 20 paragraphs and an instruction, revise them" — has three fatal problems:

Context window: A chapter can be 5000+ words. Adding the instruction, beats, and asking for a full rewrite means the LLM output needs to reproduce the ENTIRE chapter. At 16K max_tokens, you hit limits fast. Beat-by-beat processing keeps each call small.
Precision loss: When you ask an LLM to rewrite 20 paragraphs, it rewrites ALL of them — subtle tone shifts, word choice changes, dropped details. The beat-level approach lets KEEP beats pass through untouched, preserving the author's original voice in scenes that don't need changes.
Tracking what changed: Without beat mapping, you can't tell the user "beats 2 and 5 were modified" — you just get a blob of new text. The beat structure gives you an edit log.

Why separate plan_beat_actions from phase1?
plan_beat_actions sees ONLY beat descriptions (no paragraphs) — it decides "what should change" at a structural level. phase1 sees the actual paragraphs — it maps them to beats. Separation means:

The planner doesn't get distracted by prose details
The mapper doesn't try to decide what should change
Stored paragraph ranges let phase1 skip its LLM call entirely on subsequent revisions

Why not merge phase1 + phase2?
phase1 is a mapping step (which paragraphs belong to which beat). phase2 is an execution step (rewrite this slice). If you merge them, you need one massive LLM call that maps paragraphs, decides changes, AND rewrites — which fails because the context is too large and the task is too compound.
WHAT I WOULD SIMPLIFY:


The revision_planner.py _llm_plan() is mostly wasted. It identifies surfaces (title, synopsis, outlines, paragraphs) but in practice, almost every revision touches paragraph_content. The surface-by-surface approach adds 1-2 LLM calls that rarely change the outcome. You could default to "always revise paragraph_content" and skip the planning LLM call.


Surface notes (_build_surface_notes) are cosmetic — they generate one-sentence descriptions for the UI. 1 LLM call per surface for a tooltip. Could be dropped or made heuristic-only.


The evaluator retry loop rarely triggers. In practice revisions either pass on first try or the feedback is too vague to help. The retry adds latency for marginal quality gains. Could be made opt-in.


The legacy _phase2_sequential_beat_rewrites is dead code — phase2_execute_actions replaced it. Can delete.


KEEP beat continuity checks are the most debatable. Every KEEP beat with a description triggers an LLM call to check if upstream changes broke continuity. This adds 2-4 LLM calls per revision. The question is whether these continuity fixes are worth 2-4x the cost.


BOTTOM LINE:
The core 3-phase approach (plan beats → map paragraphs → execute per-beat) is the right architecture. It solves real problems. But the wrapping around it — surface planning, surface notes, evaluator retries — adds 3-4 LLM calls that could be cut. The quickest win would be:

Skip _llm_plan(), hardcode paragraph_content=true
Drop surface notes LLM calls
Make evaluator optional

That would cut a typical revision from 8 calls to 5 while keeping the core beat-level precision intact.
No results found