bigsnarfdude/multi-agent.md

## multi-agent.md

      
    Raw
  

              multi-agent.md
            
          
      ┌───────────────────────────────────────────────────────--──┐
  │                  OUTER LOOP (Claude)                      │
  │                                                           │
  │  "Meta-parameters" — control Claude's search behavior     │
  │  ┌────────────────────────────────────────────────-─┐     │
  │  │ • experiment_budget    = 5 min (outer step size) │     │
  │  │ • memory_depth         = progress.md (momentum)  │     │
  │  │ • agent_count          = 1 or 3 (batch size)     │     │
  │  │ • boldness             = how big each change is  │     │
  │  │ • exploration_strategy = sequential vs parallel  │     │
  │  │ • context_window       = how much history Claude │     │
  │  │                          sees per decision       │     │
  │  │ • coordinator_freq     = how often strategy      │     │
  │  │                          updates (multi only)    │     │
  │  └────────────────────────────────────────────────-─┘     │
  │                         │                                 │
  │              Claude proposes edit                         │
  │                         ▼                                 │
  │  ┌─────────────────────────────────────────────-────┐     │
  │  │              INNER LOOP (Training)               │     │
  │  │                                                  │     │
  │  │  "Hyperparameters" — standard ML knobs           │     │
  │  │  • matrix_lr, embed_lr, unembed_lr, scalar_lr    │     │
  │  │  • warmdown_frac, FINAL_LR_FRAC                  │     │
  │  │  • depth, model_dim, n_heads                     │     │
  │  │  • weight_decay, softcap, x0_lambda              │     │
  │  │  • DEVICE_BATCH_SIZE, TOTAL_BATCH_SIZE           │     │
  │  │                                                  │     │
  │  │        trains for N steps → val_bpb              │     │
  │  └───────────────────────────────────────────-──────┘     │
  │                         │                                 │
  │                   val_bpb flows back                      │
  │                         ▼                                 │
  │              Claude reads loss, updates strategy          │
  │              (= "backprop" through the outer loop)        │
  │                                                           │
  └──────────────────────────────────────────────────--───────┘

The meta-parameters we can name:
OUTER LOOP (no standard names — proposing these)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
experiment_budget        What we set to 5 min. Controls signal quality
vs iteration speed. THIS is the outer learning
rate. Too short = noisy gradient. Too long =
slow convergence.
memory_type              progress.md (full history) vs strategy.md
(compressed summary). Analogous to Adam vs SGD:
does the optimizer keep moving averages or just
the latest gradient?
memory_depth             How many past experiments Claude sees.
= context window of the optimizer.
Single-ralph: all 42. Multi-ralph: strategy.md
summary (~20 lines). Deeper memory = better
structural insights but slower per step.
agent_count              1 / 3 / 5. The outer loop's batch size.
More agents = lower variance gradient estimate
but less context per agent.
boldness                 How far Claude jumps per experiment.
LR 0.04→0.08 (2×) vs 0.04→0.12 (3×).
No explicit control — emerges from Claude's
learned prior. Could be prompted ("be bold"
vs "be conservative").
exploration_dim          Which subspace Claude searches.
LR space, architecture space, schedule space.
Single-ralph naturally traverses these
sequentially. Multi-ralph hits all at once.
coordinator_frequency    Multi-ralph only. How often the search
strategy is re-evaluated. Every N experiments.
= how often the outer optimizer recomputes
the gradient direction.
keep/discard_threshold   Implicit — Claude decides if an experiment
"beats" the best. Like a line search:
only accept the step if loss decreased.
THE COUPLING PROBLEM
━━━━━━━━━━━━━━━━━━━━
Outer and inner loops interact through one bottleneck:
  experiment_budget ←→ steps_per_run ←→ model_size

  5 min + depth 5  = 358 steps  (clean signal, small model)
  5 min + depth 12 = 70 steps   (noisy signal, big model)
  30 min + depth 12 = 420 steps (clean signal, big model)

  The outer loop's "learning rate" (budget) constrains which
  inner loop configurations are even evaluable. Claude can't
  distinguish "depth 12 is bad" from "depth 12 needs more
  steps" without reasoning about this coupling.

  Run 1's agent DID reason about it at experiment 17:
  "Speed > capacity on this GPU"
  = Claude understood the coupling and exploited it.
  This is the outer optimizer becoming aware of its own
  learning rate. Meta-cognition as optimization.

This is genuinely new territory. Traditional hyperparameter search (Bayesian optimization, random search, grid search)
doesn't have "memory" or "reasoning" — the optimizer is stateless. Claude's outer loop has state (progress.md), learned
priors (training data), and the ability to reason about the coupling between outer and inner loops. There's no existing
vocabulary because no previous optimizer could do this.
No results found