Skip to content

Instantly share code, notes, and snippets.

@nibzard
Last active March 10, 2026 01:45
Show Gist options
  • Select an option

  • Save nibzard/e407ab353dc8233a7e2508bd87d75a4f to your computer and use it in GitHub Desktop.

Select an option

Save nibzard/e407ab353dc8233a7e2508bd87d75a4f to your computer and use it in GitHub Desktop.
SkillPA optimization report: opt-chatgpt-20260308-153903
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Optimization Report - chatgpt</title>
<style>
:root {
--bg: #f6f7f3;
--surface: #ffffff;
--ink: #14212e;
--muted: #5b6673;
--accent: #126b79;
--line: #d6dbe2;
--ok: #2c7a42;
--bad: #9a2e2e;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: 'IBM Plex Sans', 'Segoe UI', sans-serif;
color: var(--ink);
background: linear-gradient(180deg, #f8f9f6 0%, #edf2ee 100%);
}
.wrap { max-width: 1180px; margin: 0 auto; padding: 24px; }
h1 { margin: 0 0 8px; font-size: 30px; }
.sub { color: var(--muted); margin-bottom: 18px; }
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 10px; margin-bottom: 18px; }
.card { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 12px; }
.k { color: var(--muted); font-size: 12px; text-transform: uppercase; letter-spacing: .03em; }
.v { font-size: 24px; font-weight: 650; margin-top: 4px; }
section { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 14px; margin-bottom: 14px; overflow-x: auto; }
h3 { margin: 0 0 6px; font-size: 15px; }
h2 { margin: 0 0 10px; font-size: 18px; }
table { width: 100%; border-collapse: collapse; font-size: 13px; }
th, td { border-bottom: 1px solid var(--line); padding: 8px 6px; text-align: left; vertical-align: top; }
th { color: var(--muted); font-weight: 600; }
ul { margin: 0; padding-left: 18px; }
.lineage { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 10px; }
.lineage-node { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: linear-gradient(180deg, #ffffff 0%, #f6faf8 100%); }
.lineage-iteration { background: linear-gradient(180deg, #ffffff 0%, #f7fbff 100%); }
.lineage-promoted { background: linear-gradient(180deg, #ffffff 0%, #f6fcf6 100%); }
.lineage-title { font-weight: 700; margin-bottom: 2px; }
.lineage-sub, .lineage-path, .lineage-state, .lineage-detail { font-size: 12px; color: var(--muted); margin-top: 4px; }
.iteration-block { margin-top: 14px; }
.iteration-head { margin-bottom: 8px; }
.candidate-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 10px; }
.candidate-card { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: #fcfcfa; }
.candidate-head { display: flex; justify-content: space-between; gap: 10px; align-items: flex-start; margin-bottom: 8px; }
.candidate-title { font-weight: 700; }
.badge-row { display: flex; flex-wrap: wrap; gap: 6px; justify-content: flex-end; }
.badge { border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; font-size: 11px; color: var(--muted); background: #f5f7f9; }
.badge-mode { background: #eef7fb; color: #25556f; }
.badge-validated { background: #eef7ef; color: #2b6540; }
.badge-selected { background: #fff4dc; color: #8b5a00; }
.badge-promoted { background: #e8f7e9; color: #21643a; }
.badge-missing { background: #fdecec; color: #8e3434; }
.candidate-metrics { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 6px; margin-bottom: 8px; }
.metric-k { display: block; font-size: 11px; color: var(--muted); text-transform: uppercase; }
.metric-v { display: block; font-size: 15px; font-weight: 650; margin-top: 2px; }
.candidate-detail { font-size: 12px; color: var(--muted); margin-top: 4px; }
.journey-step { border: 1px solid var(--line); border-radius: 10px; padding: 10px 12px; background: #fafcfb; margin-bottom: 10px; }
.journey-step summary { cursor: pointer; font-weight: 650; }
.grid-line { stroke: #e6ebf0; stroke-width: 1; }
.axis-line { stroke: #8b97a5; stroke-width: 1; }
.axis-label { fill: #647182; font-size: 11px; }
.score-line { stroke: var(--accent); stroke-width: 2.3; }
.line-point { fill: #126b79; }
.cost-bar { fill: #2d6b9f; opacity: .82; }
.diff-file { color: #9ec1ff; display: block; }
.diff-hunk { color: #ffd479; display: block; }
.diff-add { color: #65d28d; background: #123321; display: block; }
.diff-del { color: #ff9da1; background: #3b1818; display: block; }
.diff-neutral { color: #d9e4ef; display: block; }
.foot { color: var(--muted); font-size: 12px; margin-top: 16px; }
</style>
</head>
<body>
<div class="wrap">
<h1>Optimization Report</h1>
<div class="sub">Domain: <strong>chatgpt</strong> | Run generated: 2026-03-08T17:08:15.750400+00:00</div>
<div class="cards">
<div class="card"><div class="k">Baseline score</div><div class="v">0.6583</div></div>
<div class="card"><div class="k">Final test score</div><div class="v">1.0000</div></div>
<div class="card"><div class="k">Delta</div><div class="v">0.3417</div></div>
<div class="card"><div class="k">Total cost</div><div class="v">$8.3342</div></div>
<div class="card"><div class="k">Wall clock</div><div class="v">1h 29m 12s</div></div>
<div class="card"><div class="k">Total invocations</div><div class="v">110</div></div>
<div class="card"><div class="k">Input tokens</div><div class="v">383,992</div></div>
<div class="card"><div class="k">Output tokens</div><div class="v">124,665</div></div>
<div class="card"><div class="k">Cache read tokens</div><div class="v">14,233,088</div></div>
<div class="card"><div class="k">Total tokens (incl cache)</div><div class="v">14,741,745</div></div>
<div class="card"><div class="k">Input from cache</div><div class="v">97.37%</div></div>
</div>
<section>
<h2>Score Over Iterations</h2>
<svg width="760" height="240" role="img" aria-label="Score trend chart"><line x1="46" y1="20.0" x2="742" y2="20.0" class="grid-line" /><line x1="46" y1="65.5" x2="742" y2="65.5" class="grid-line" /><line x1="46" y1="111.0" x2="742" y2="111.0" class="grid-line" /><line x1="46" y1="156.5" x2="742" y2="156.5" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="axis-line" /><line x1="46" y1="20" x2="46" y2="202.0" class="axis-line" /><polyline fill="none" points="46.0,202.0 278.0,158.1 510.0,142.5 742.0,20.0" class="score-line" /><circle cx="46.0" cy="202.0" r="3.5" class="line-point"><title>baseline-val: 0.6583</title></circle><circle cx="278.0" cy="158.1" r="3.5" class="line-point"><title>iter-1-selected-val: 0.7408</title></circle><circle cx="510.0" cy="142.5" r="3.5" class="line-point"><title>iter-2-selected-val: 0.7700</title></circle><circle cx="742.0" cy="20.0" r="3.5" class="line-point"><title>final-test: 1.0000</title></circle><text x="40" y="24.0" text-anchor="end" class="axis-label">1.00</text><text x="40" y="69.5" text-anchor="end" class="axis-label">0.91</text><text x="40" y="115.0" text-anchor="end" class="axis-label">0.83</text><text x="40" y="160.5" text-anchor="end" class="axis-label">0.74</text><text x="40" y="206.0" text-anchor="end" class="axis-label">0.66</text><text x="46.0" y="228" text-anchor="middle" class="axis-label">baseline-val</text><text x="278.0" y="228" text-anchor="middle" class="axis-label">iter-1-selected-val</text><text x="510.0" y="228" text-anchor="middle" class="axis-label">iter-2-selected-val</text><text x="742.0" y="228" text-anchor="middle" class="axis-label">final-test</text></svg>
</section>
<section>
<h2>Cost Over Phases (USD)</h2>
<svg width="760" height="220" role="img" aria-label="Cost chart"><line x1="46" y1="174.0" x2="742" y2="174.0" class="axis-line" /><rect x="51.0" y="66.9" width="23.2" height="107.1" class="cost-bar"><title>baseline-val: 0.5955</title></rect><rect x="84.1" y="161.5" width="23.2" height="12.5" class="cost-bar"><title>baseline-core: 0.0697</title></rect><rect x="117.3" y="139.9" width="23.2" height="34.1" class="cost-bar"><title>iter-1-mutation: 0.1896</title></rect><rect x="150.4" y="71.9" width="23.2" height="102.1" class="cost-bar"><title>iter-1-train-candidate-00: 0.5676</title></rect><rect x="183.5" y="28.7" width="23.2" height="145.3" class="cost-bar"><title>iter-1-train-candidate-01: 0.8073</title></rect><rect x="216.7" y="101.4" width="23.2" height="72.6" class="cost-bar"><title>iter-1-train-candidate-02: 0.4034</title></rect><rect x="249.8" y="136.0" width="23.2" height="38.0" class="cost-bar"><title>iter-1-train-candidate-03: 0.2113</title></rect><rect x="283.0" y="16.0" width="23.2" height="158.0" class="cost-bar"><title>iter-1-val-candidate-00: 0.8781</title></rect><rect x="316.1" y="92.5" width="23.2" height="81.5" class="cost-bar"><title>iter-1-val-candidate-01: 0.4531</title></rect><rect x="349.3" y="169.0" width="23.2" height="5.0" class="cost-bar"><title>iter-1-core-candidate-00: 0.0279</title></rect><rect x="382.4" y="166.5" width="23.2" height="7.5" class="cost-bar"><title>iter-1-core-candidate-01: 0.0416</title></rect><rect x="415.5" y="158.7" width="23.2" height="15.3" class="cost-bar"><title>iter-2-mutation: 0.0851</title></rect><rect x="448.7" y="94.7" width="23.2" height="79.3" class="cost-bar"><title>iter-2-train-candidate-00: 0.4406</title></rect><rect x="481.8" y="57.5" width="23.2" height="116.5" class="cost-bar"><title>iter-2-train-candidate-01: 0.6474</title></rect><rect x="515.0" y="96.2" width="23.2" height="77.8" class="cost-bar"><title>iter-2-train-candidate-02: 0.4323</title></rect><rect x="548.1" y="72.1" width="23.2" height="101.9" class="cost-bar"><title>iter-2-train-candidate-03: 0.5666</title></rect><rect x="581.3" y="34.1" width="23.2" height="139.9" class="cost-bar"><title>iter-2-val-candidate-00: 0.7775</title></rect><rect x="614.4" y="73.0" width="23.2" height="101.0" class="cost-bar"><title>iter-2-val-candidate-01: 0.5615</title></rect><rect x="647.5" y="169.5" width="23.2" height="4.5" class="cost-bar"><title>iter-2-core-candidate-00: 0.0249</title></rect><rect x="680.7" y="169.7" width="23.2" height="4.3" class="cost-bar"><title>iter-2-core-candidate-01: 0.0238</title></rect><rect x="713.8" y="78.7" width="23.2" height="95.3" class="cost-bar"><title>final-test: 0.5295</title></rect><text x="62.6" y="204" text-anchor="middle" class="axis-label">baseline-val</text><text x="95.7" y="204" text-anchor="middle" class="axis-label">baseline-core</text><text x="128.9" y="204" text-anchor="middle" class="axis-label">iter-1-mutation</text><text x="162.0" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-00</text><text x="195.1" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-01</text><text x="228.3" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-02</text><text x="261.4" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-03</text><text x="294.6" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-00</text><text x="327.7" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-01</text><text x="360.9" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-00</text><text x="394.0" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-01</text><text x="427.1" y="204" text-anchor="middle" class="axis-label">iter-2-mutation</text><text x="460.3" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-00</text><text x="493.4" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-01</text><text x="526.6" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-02</text><text x="559.7" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-03</text><text x="592.9" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-00</text><text x="626.0" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-01</text><text x="659.1" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-00</text><text x="692.3" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-01</text><text x="725.4" y="204" text-anchor="middle" class="axis-label">final-test</text></svg>
</section>
<section>
<h2>Overlay Lineage</h2>
<div class="sub">How the champion evolved across iterations, including held-champion steps when gates rejected the selected candidate.</div>
<div class="lineage">
<div class="lineage-node lineage-baseline">
<div class="lineage-title">Initial overlay</div>
<div class="lineage-sub">starting champion</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 1</div>
<div class="lineage-sub">selected candidate-00</div>
<div class="lineage-path">candidates/iter-1/candidate-00.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">champion after gates: iter-1 candidate-00</div><div class="lineage-detail">selected val: 0.7408</div><div class="lineage-detail">improvement: 0.0825</div><div class="lineage-detail">result: promoted</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 2</div>
<div class="lineage-sub">selected candidate-01</div>
<div class="lineage-path">candidates/iter-2/candidate-01.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-01</div><div class="lineage-detail">champion after gates: iter-2 candidate-01</div><div class="lineage-detail">selected val: 0.7700</div><div class="lineage-detail">improvement: 0.0292</div><div class="lineage-detail">result: promoted</div>
</div>
<div class="lineage-node lineage-promoted">
<div class="lineage-title">Promoted overlay</div>
<div class="lineage-sub">final published artifact</div>
<div class="lineage-path">promoted.overlay.md</div>
<div class="lineage-state">present</div>
</div>
</div>
</section>
<section>
<h2>Iteration Candidates</h2>
<div class="sub">Per-iteration view of the explored candidate overlays, their scores, and whether they survived validation and gates.</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 1</h3>
<div class="sub">selected candidate: 0 | result: promoted</div>
<div class="sub">gate reasons: none</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span><span class="badge badge-promoted">promoted</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.9733</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.7408</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.9333</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.6658</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6133</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.3400</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 0 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
</div>
</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 2</h3>
<div class="sub">selected candidate: 1 | result: promoted</div>
<div class="sub">gate reasons: none</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7208</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.6058</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span><span class="badge badge-promoted">promoted</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7167</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.7700</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 2 | val 3 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6733</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6550</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
</div>
</div>
</section>
<section>
<h2>Promotion Decisions</h2>
<table>
<thead>
<tr>
<th>Iteration</th><th>Selected candidate</th><th>Selected val score</th><th>Improvement</th><th>Promoted</th><th>Reasons</th>
</tr>
</thead>
<tbody>
<tr><td>1</td><td>0</td><td>0.7408</td><td>0.0825</td><td>yes</td><td></td></tr><tr><td>2</td><td>1</td><td>0.7700</td><td>0.0292</td><td>yes</td><td></td></tr>
</tbody>
</table>
</section>
<section>
<h2>Token and Cost Breakdown</h2>
<table>
<thead>
<tr>
<th>Role</th><th>Provider</th><th>Model</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cache create</th><th>Cost</th>
</tr>
</thead>
<tbody>
<tr><td>executor</td><td>claude</td><td>glm-4.5-air</td><td>246,002</td><td>16,436</td><td>192,683</td><td>0</td><td>$1.0424</td></tr><tr><td>executor</td><td>claude</td><td>glm-4.7</td><td>280,596</td><td>95,782</td><td>13,568,256</td><td>0</td><td>$6.3490</td></tr><tr><td>judge</td><td>claude</td><td>glm-4.7</td><td>75,752</td><td>19,231</td><td>508,288</td><td>0</td><td>$0.6682</td></tr><tr><td>judge</td><td>claude</td><td>sonnet</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td></tr><tr><td>mutator</td><td>claude</td><td>glm-4.7</td><td>27,644</td><td>9,652</td><td>156,544</td><td>0</td><td>$0.2747</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Provider Invocation Breakdown</h2>
<table>
<thead>
<tr>
<th>Provider</th><th>Total</th><th>Mutator</th><th>Executor</th><th>Judge</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cost</th>
</tr>
</thead>
<tbody>
<tr><td>claude</td><td>110</td><td>6</td><td>52</td><td>52</td><td>383,992</td><td>124,665</td><td>14,233,088</td><td>$8.3342</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Run Diagnostics</h2>
<table>
<thead>
<tr>
<th>Metric</th><th>Value</th>
</tr>
</thead>
<tbody>
<tr><td>Run started at (UTC)</td><td>2026-03-08T15:39:03.283376+00:00</td></tr>
<tr><td>Run ended at (UTC)</td><td>2026-03-08T17:08:15.750400+00:00</td></tr>
<tr><td>Wall clock duration</td><td>1h 29m 12s</td></tr>
<tr><td>Active runtime (eval + mutator)</td><td>1h 24m 4s</td></tr>
<tr><td>Orchestration overhead</td><td>5m 8s</td></tr>
<tr><td>Eval runtime total</td><td>1h 22m 8s</td></tr>
<tr><td>Mutator runtime total</td><td>1m 56s</td></tr>
<tr><td>Eval tasks with runtime</td><td>52</td></tr>
<tr><td>Mutator calls with duration</td><td>6</td></tr>
<tr><td>Candidate overlays generated</td><td>8</td></tr>
<tr><td>Candidate metadata records</td><td>8</td></tr>
<tr><td>Candidate overlays train-tested</td><td>8</td></tr>
<tr><td>Candidate overlays val/core-tested</td><td>4</td></tr>
<tr><td>Promotion events</td><td>2</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Quality Flags</h2>
<ul><li>missing_cost_usage</li><li>missing_directional_token_usage</li></ul>
</section>
<section>
<h2>Overlay Diff (Initial vs Promoted)</h2>
<div class="sub">Initial: /home/agent/skillpa/reports/opt-chatgpt-20260308-153903/inputs/initial.overlay.md<br/>Promoted: /home/agent/skillpa/reports/opt-chatgpt-20260308-153903/promoted.overlay.md</div>
<div class="sub">Initial lines: 93 | Promoted lines: 94 | +3 / -2</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-chatgpt-20260308-153903/inputs/initial.overlay.md</span>
<span class="diff-file">+++ opt-chatgpt-20260308-153903/promoted.overlay.md</span>
<span class="diff-hunk">@@ -45,7 +45,6 @@</span>
<span class="diff-neutral"> fi</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> printf &#x27;%s\n&#x27; &quot;$SNAPSHOT&quot;</span>
<span class="diff-del">-steel browser get text main --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> steel browser stop --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> ```</span>
<span class="diff-neutral"> </span>
<span class="diff-hunk">@@ -58,7 +57,6 @@</span>
<span class="diff-neutral"> steel browser wait 20000 --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> SNAPSHOT=&quot;$(steel browser snapshot --session &quot;$SESSION&quot;)&quot;</span>
<span class="diff-neutral"> printf &#x27;%s\n&#x27; &quot;$SNAPSHOT&quot;</span>
<span class="diff-del">-steel browser get text main --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> ```</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Failure policy</span>
<span class="diff-hunk">@@ -75,12 +73,15 @@</span>
<span class="diff-neutral"> Use explicit selector `#prompt-textarea`.</span>
<span class="diff-neutral"> - `Expected string, received null`:</span>
<span class="diff-neutral"> Avoid role queries without explicit names and stable scope.</span>
<span class="diff-add">+- `unknown subcommand`:</span>
<span class="diff-add">+ Verify command syntax matches `steel browser &lt;action&gt; [options]`. Do not use `get text` as separate command.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Do not</span>
<span class="diff-neutral"> - Do not use ephemeral refs from old snapshots (`@eNN`) across new snapshots.</span>
<span class="diff-neutral"> - Do not click ambiguous text labels like `Continue` or `What can I help with?`.</span>
<span class="diff-neutral"> - Do not use unquoted URLs containing `?` or `&amp;`.</span>
<span class="diff-neutral"> - Do not leave a session running.</span>
<span class="diff-add">+- Do not use `steel browser get text main` — `get` is not a valid subcommand.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Overlay path parity</span>
<span class="diff-neutral"> Keep this same overlay text in both paths:</span></pre>
</section>
<section>
<h2>Stepwise Overlay Diffs</h2>
<div class="sub">Attempted changes per iteration, diffed against the current champion at that point in the run.</div>
<details class="journey-step">
<summary>Iteration 1: selected candidate-00</summary>
<div class="sub">opt-chatgpt-20260308-153903/inputs/initial.overlay.md -> opt-chatgpt-20260308-153903/candidates/iter-1/candidate-00.md</div>
<div class="sub">selected val 0.7408 | improvement 0.0825 | promoted</div>
<div class="sub">lines: 93 -> 93 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
<details class="journey-step">
<summary>Iteration 2: selected candidate-01</summary>
<div class="sub">opt-chatgpt-20260308-153903/candidates/iter-1/candidate-00.md -> opt-chatgpt-20260308-153903/candidates/iter-2/candidate-01.md</div>
<div class="sub">selected val 0.7700 | improvement 0.0292 | promoted</div>
<div class="sub">lines: 93 -> 94 | +3 / -2</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-chatgpt-20260308-153903/candidates/iter-1/candidate-00.md</span>
<span class="diff-file">+++ opt-chatgpt-20260308-153903/candidates/iter-2/candidate-01.md</span>
<span class="diff-hunk">@@ -45,7 +45,6 @@</span>
<span class="diff-neutral"> fi</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> printf &#x27;%s\n&#x27; &quot;$SNAPSHOT&quot;</span>
<span class="diff-del">-steel browser get text main --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> steel browser stop --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> ```</span>
<span class="diff-neutral"> </span>
<span class="diff-hunk">@@ -58,7 +57,6 @@</span>
<span class="diff-neutral"> steel browser wait 20000 --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> SNAPSHOT=&quot;$(steel browser snapshot --session &quot;$SESSION&quot;)&quot;</span>
<span class="diff-neutral"> printf &#x27;%s\n&#x27; &quot;$SNAPSHOT&quot;</span>
<span class="diff-del">-steel browser get text main --session &quot;$SESSION&quot;</span>
<span class="diff-neutral"> ```</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Failure policy</span>
<span class="diff-hunk">@@ -75,12 +73,15 @@</span>
<span class="diff-neutral"> Use explicit selector `#prompt-textarea`.</span>
<span class="diff-neutral"> - `Expected string, received null`:</span>
<span class="diff-neutral"> Avoid role queries without explicit names and stable scope.</span>
<span class="diff-add">+- `unknown subcommand`:</span>
<span class="diff-add">+ Verify command syntax matches `steel browser &lt;action&gt; [options]`. Do not use `get text` as separate command.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Do not</span>
<span class="diff-neutral"> - Do not use ephemeral refs from old snapshots (`@eNN`) across new snapshots.</span>
<span class="diff-neutral"> - Do not click ambiguous text labels like `Continue` or `What can I help with?`.</span>
<span class="diff-neutral"> - Do not use unquoted URLs containing `?` or `&amp;`.</span>
<span class="diff-neutral"> - Do not leave a session running.</span>
<span class="diff-add">+- Do not use `steel browser get text main` — `get` is not a valid subcommand.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Overlay path parity</span>
<span class="diff-neutral"> Keep this same overlay text in both paths:</span></pre>
</details>
<details class="journey-step">
<summary>Final champion to promoted overlay</summary>
<div class="sub">opt-chatgpt-20260308-153903/candidates/iter-2/candidate-01.md -> opt-chatgpt-20260308-153903/promoted.overlay.md</div>
<div class="sub">artifact diff</div>
<div class="sub">lines: 94 -> 94 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
</section>
<div class="foot">Report schema v1 | Generated: 2026-03-10T01:45:04.462281+00:00</div>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment