Last active
March 10, 2026 01:45
-
-
Save nibzard/e407ab353dc8233a7e2508bd87d75a4f to your computer and use it in GitHub Desktop.
SkillPA optimization report: opt-chatgpt-20260308-153903
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!doctype html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> | |
| <title>Optimization Report - chatgpt</title> | |
| <style> | |
| :root { | |
| --bg: #f6f7f3; | |
| --surface: #ffffff; | |
| --ink: #14212e; | |
| --muted: #5b6673; | |
| --accent: #126b79; | |
| --line: #d6dbe2; | |
| --ok: #2c7a42; | |
| --bad: #9a2e2e; | |
| } | |
| * { box-sizing: border-box; } | |
| body { | |
| margin: 0; | |
| font-family: 'IBM Plex Sans', 'Segoe UI', sans-serif; | |
| color: var(--ink); | |
| background: linear-gradient(180deg, #f8f9f6 0%, #edf2ee 100%); | |
| } | |
| .wrap { max-width: 1180px; margin: 0 auto; padding: 24px; } | |
| h1 { margin: 0 0 8px; font-size: 30px; } | |
| .sub { color: var(--muted); margin-bottom: 18px; } | |
| .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 10px; margin-bottom: 18px; } | |
| .card { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 12px; } | |
| .k { color: var(--muted); font-size: 12px; text-transform: uppercase; letter-spacing: .03em; } | |
| .v { font-size: 24px; font-weight: 650; margin-top: 4px; } | |
| section { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 14px; margin-bottom: 14px; overflow-x: auto; } | |
| h3 { margin: 0 0 6px; font-size: 15px; } | |
| h2 { margin: 0 0 10px; font-size: 18px; } | |
| table { width: 100%; border-collapse: collapse; font-size: 13px; } | |
| th, td { border-bottom: 1px solid var(--line); padding: 8px 6px; text-align: left; vertical-align: top; } | |
| th { color: var(--muted); font-weight: 600; } | |
| ul { margin: 0; padding-left: 18px; } | |
| .lineage { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 10px; } | |
| .lineage-node { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: linear-gradient(180deg, #ffffff 0%, #f6faf8 100%); } | |
| .lineage-iteration { background: linear-gradient(180deg, #ffffff 0%, #f7fbff 100%); } | |
| .lineage-promoted { background: linear-gradient(180deg, #ffffff 0%, #f6fcf6 100%); } | |
| .lineage-title { font-weight: 700; margin-bottom: 2px; } | |
| .lineage-sub, .lineage-path, .lineage-state, .lineage-detail { font-size: 12px; color: var(--muted); margin-top: 4px; } | |
| .iteration-block { margin-top: 14px; } | |
| .iteration-head { margin-bottom: 8px; } | |
| .candidate-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 10px; } | |
| .candidate-card { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: #fcfcfa; } | |
| .candidate-head { display: flex; justify-content: space-between; gap: 10px; align-items: flex-start; margin-bottom: 8px; } | |
| .candidate-title { font-weight: 700; } | |
| .badge-row { display: flex; flex-wrap: wrap; gap: 6px; justify-content: flex-end; } | |
| .badge { border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; font-size: 11px; color: var(--muted); background: #f5f7f9; } | |
| .badge-mode { background: #eef7fb; color: #25556f; } | |
| .badge-validated { background: #eef7ef; color: #2b6540; } | |
| .badge-selected { background: #fff4dc; color: #8b5a00; } | |
| .badge-promoted { background: #e8f7e9; color: #21643a; } | |
| .badge-missing { background: #fdecec; color: #8e3434; } | |
| .candidate-metrics { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 6px; margin-bottom: 8px; } | |
| .metric-k { display: block; font-size: 11px; color: var(--muted); text-transform: uppercase; } | |
| .metric-v { display: block; font-size: 15px; font-weight: 650; margin-top: 2px; } | |
| .candidate-detail { font-size: 12px; color: var(--muted); margin-top: 4px; } | |
| .journey-step { border: 1px solid var(--line); border-radius: 10px; padding: 10px 12px; background: #fafcfb; margin-bottom: 10px; } | |
| .journey-step summary { cursor: pointer; font-weight: 650; } | |
| .grid-line { stroke: #e6ebf0; stroke-width: 1; } | |
| .axis-line { stroke: #8b97a5; stroke-width: 1; } | |
| .axis-label { fill: #647182; font-size: 11px; } | |
| .score-line { stroke: var(--accent); stroke-width: 2.3; } | |
| .line-point { fill: #126b79; } | |
| .cost-bar { fill: #2d6b9f; opacity: .82; } | |
| .diff-file { color: #9ec1ff; display: block; } | |
| .diff-hunk { color: #ffd479; display: block; } | |
| .diff-add { color: #65d28d; background: #123321; display: block; } | |
| .diff-del { color: #ff9da1; background: #3b1818; display: block; } | |
| .diff-neutral { color: #d9e4ef; display: block; } | |
| .foot { color: var(--muted); font-size: 12px; margin-top: 16px; } | |
| </style> | |
| </head> | |
| <body> | |
| <div class="wrap"> | |
| <h1>Optimization Report</h1> | |
| <div class="sub">Domain: <strong>chatgpt</strong> | Run generated: 2026-03-08T17:08:15.750400+00:00</div> | |
| <div class="cards"> | |
| <div class="card"><div class="k">Baseline score</div><div class="v">0.6583</div></div> | |
| <div class="card"><div class="k">Final test score</div><div class="v">1.0000</div></div> | |
| <div class="card"><div class="k">Delta</div><div class="v">0.3417</div></div> | |
| <div class="card"><div class="k">Total cost</div><div class="v">$8.3342</div></div> | |
| <div class="card"><div class="k">Wall clock</div><div class="v">1h 29m 12s</div></div> | |
| <div class="card"><div class="k">Total invocations</div><div class="v">110</div></div> | |
| <div class="card"><div class="k">Input tokens</div><div class="v">383,992</div></div> | |
| <div class="card"><div class="k">Output tokens</div><div class="v">124,665</div></div> | |
| <div class="card"><div class="k">Cache read tokens</div><div class="v">14,233,088</div></div> | |
| <div class="card"><div class="k">Total tokens (incl cache)</div><div class="v">14,741,745</div></div> | |
| <div class="card"><div class="k">Input from cache</div><div class="v">97.37%</div></div> | |
| </div> | |
| <section> | |
| <h2>Score Over Iterations</h2> | |
| <svg width="760" height="240" role="img" aria-label="Score trend chart"><line x1="46" y1="20.0" x2="742" y2="20.0" class="grid-line" /><line x1="46" y1="65.5" x2="742" y2="65.5" class="grid-line" /><line x1="46" y1="111.0" x2="742" y2="111.0" class="grid-line" /><line x1="46" y1="156.5" x2="742" y2="156.5" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="axis-line" /><line x1="46" y1="20" x2="46" y2="202.0" class="axis-line" /><polyline fill="none" points="46.0,202.0 278.0,158.1 510.0,142.5 742.0,20.0" class="score-line" /><circle cx="46.0" cy="202.0" r="3.5" class="line-point"><title>baseline-val: 0.6583</title></circle><circle cx="278.0" cy="158.1" r="3.5" class="line-point"><title>iter-1-selected-val: 0.7408</title></circle><circle cx="510.0" cy="142.5" r="3.5" class="line-point"><title>iter-2-selected-val: 0.7700</title></circle><circle cx="742.0" cy="20.0" r="3.5" class="line-point"><title>final-test: 1.0000</title></circle><text x="40" y="24.0" text-anchor="end" class="axis-label">1.00</text><text x="40" y="69.5" text-anchor="end" class="axis-label">0.91</text><text x="40" y="115.0" text-anchor="end" class="axis-label">0.83</text><text x="40" y="160.5" text-anchor="end" class="axis-label">0.74</text><text x="40" y="206.0" text-anchor="end" class="axis-label">0.66</text><text x="46.0" y="228" text-anchor="middle" class="axis-label">baseline-val</text><text x="278.0" y="228" text-anchor="middle" class="axis-label">iter-1-selected-val</text><text x="510.0" y="228" text-anchor="middle" class="axis-label">iter-2-selected-val</text><text x="742.0" y="228" text-anchor="middle" class="axis-label">final-test</text></svg> | |
| </section> | |
| <section> | |
| <h2>Cost Over Phases (USD)</h2> | |
| <svg width="760" height="220" role="img" aria-label="Cost chart"><line x1="46" y1="174.0" x2="742" y2="174.0" class="axis-line" /><rect x="51.0" y="66.9" width="23.2" height="107.1" class="cost-bar"><title>baseline-val: 0.5955</title></rect><rect x="84.1" y="161.5" width="23.2" height="12.5" class="cost-bar"><title>baseline-core: 0.0697</title></rect><rect x="117.3" y="139.9" width="23.2" height="34.1" class="cost-bar"><title>iter-1-mutation: 0.1896</title></rect><rect x="150.4" y="71.9" width="23.2" height="102.1" class="cost-bar"><title>iter-1-train-candidate-00: 0.5676</title></rect><rect x="183.5" y="28.7" width="23.2" height="145.3" class="cost-bar"><title>iter-1-train-candidate-01: 0.8073</title></rect><rect x="216.7" y="101.4" width="23.2" height="72.6" class="cost-bar"><title>iter-1-train-candidate-02: 0.4034</title></rect><rect x="249.8" y="136.0" width="23.2" height="38.0" class="cost-bar"><title>iter-1-train-candidate-03: 0.2113</title></rect><rect x="283.0" y="16.0" width="23.2" height="158.0" class="cost-bar"><title>iter-1-val-candidate-00: 0.8781</title></rect><rect x="316.1" y="92.5" width="23.2" height="81.5" class="cost-bar"><title>iter-1-val-candidate-01: 0.4531</title></rect><rect x="349.3" y="169.0" width="23.2" height="5.0" class="cost-bar"><title>iter-1-core-candidate-00: 0.0279</title></rect><rect x="382.4" y="166.5" width="23.2" height="7.5" class="cost-bar"><title>iter-1-core-candidate-01: 0.0416</title></rect><rect x="415.5" y="158.7" width="23.2" height="15.3" class="cost-bar"><title>iter-2-mutation: 0.0851</title></rect><rect x="448.7" y="94.7" width="23.2" height="79.3" class="cost-bar"><title>iter-2-train-candidate-00: 0.4406</title></rect><rect x="481.8" y="57.5" width="23.2" height="116.5" class="cost-bar"><title>iter-2-train-candidate-01: 0.6474</title></rect><rect x="515.0" y="96.2" width="23.2" height="77.8" class="cost-bar"><title>iter-2-train-candidate-02: 0.4323</title></rect><rect x="548.1" y="72.1" width="23.2" height="101.9" class="cost-bar"><title>iter-2-train-candidate-03: 0.5666</title></rect><rect x="581.3" y="34.1" width="23.2" height="139.9" class="cost-bar"><title>iter-2-val-candidate-00: 0.7775</title></rect><rect x="614.4" y="73.0" width="23.2" height="101.0" class="cost-bar"><title>iter-2-val-candidate-01: 0.5615</title></rect><rect x="647.5" y="169.5" width="23.2" height="4.5" class="cost-bar"><title>iter-2-core-candidate-00: 0.0249</title></rect><rect x="680.7" y="169.7" width="23.2" height="4.3" class="cost-bar"><title>iter-2-core-candidate-01: 0.0238</title></rect><rect x="713.8" y="78.7" width="23.2" height="95.3" class="cost-bar"><title>final-test: 0.5295</title></rect><text x="62.6" y="204" text-anchor="middle" class="axis-label">baseline-val</text><text x="95.7" y="204" text-anchor="middle" class="axis-label">baseline-core</text><text x="128.9" y="204" text-anchor="middle" class="axis-label">iter-1-mutation</text><text x="162.0" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-00</text><text x="195.1" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-01</text><text x="228.3" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-02</text><text x="261.4" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-03</text><text x="294.6" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-00</text><text x="327.7" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-01</text><text x="360.9" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-00</text><text x="394.0" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-01</text><text x="427.1" y="204" text-anchor="middle" class="axis-label">iter-2-mutation</text><text x="460.3" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-00</text><text x="493.4" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-01</text><text x="526.6" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-02</text><text x="559.7" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-03</text><text x="592.9" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-00</text><text x="626.0" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-01</text><text x="659.1" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-00</text><text x="692.3" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-01</text><text x="725.4" y="204" text-anchor="middle" class="axis-label">final-test</text></svg> | |
| </section> | |
| <section> | |
| <h2>Overlay Lineage</h2> | |
| <div class="sub">How the champion evolved across iterations, including held-champion steps when gates rejected the selected candidate.</div> | |
| <div class="lineage"> | |
| <div class="lineage-node lineage-baseline"> | |
| <div class="lineage-title">Initial overlay</div> | |
| <div class="lineage-sub">starting champion</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 1</div> | |
| <div class="lineage-sub">selected candidate-00</div> | |
| <div class="lineage-path">candidates/iter-1/candidate-00.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">champion after gates: iter-1 candidate-00</div><div class="lineage-detail">selected val: 0.7408</div><div class="lineage-detail">improvement: 0.0825</div><div class="lineage-detail">result: promoted</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 2</div> | |
| <div class="lineage-sub">selected candidate-01</div> | |
| <div class="lineage-path">candidates/iter-2/candidate-01.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-01</div><div class="lineage-detail">champion after gates: iter-2 candidate-01</div><div class="lineage-detail">selected val: 0.7700</div><div class="lineage-detail">improvement: 0.0292</div><div class="lineage-detail">result: promoted</div> | |
| </div> | |
| <div class="lineage-node lineage-promoted"> | |
| <div class="lineage-title">Promoted overlay</div> | |
| <div class="lineage-sub">final published artifact</div> | |
| <div class="lineage-path">promoted.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Iteration Candidates</h2> | |
| <div class="sub">Per-iteration view of the explored candidate overlays, their scores, and whether they survived validation and gates.</div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 1</h3> | |
| <div class="sub">selected candidate: 0 | result: promoted</div> | |
| <div class="sub">gate reasons: none</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span><span class="badge badge-promoted">promoted</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.9733</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.7408</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.9333</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.6658</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6133</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.3400</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 0 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| </div> | |
| </div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 2</h3> | |
| <div class="sub">selected candidate: 1 | result: promoted</div> | |
| <div class="sub">gate reasons: none</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7208</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.6058</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 2 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span><span class="badge badge-promoted">promoted</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7167</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.7700</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 3 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6733</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6550</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Promotion Decisions</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Iteration</th><th>Selected candidate</th><th>Selected val score</th><th>Improvement</th><th>Promoted</th><th>Reasons</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>1</td><td>0</td><td>0.7408</td><td>0.0825</td><td>yes</td><td></td></tr><tr><td>2</td><td>1</td><td>0.7700</td><td>0.0292</td><td>yes</td><td></td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Token and Cost Breakdown</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Role</th><th>Provider</th><th>Model</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cache create</th><th>Cost</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>executor</td><td>claude</td><td>glm-4.5-air</td><td>246,002</td><td>16,436</td><td>192,683</td><td>0</td><td>$1.0424</td></tr><tr><td>executor</td><td>claude</td><td>glm-4.7</td><td>280,596</td><td>95,782</td><td>13,568,256</td><td>0</td><td>$6.3490</td></tr><tr><td>judge</td><td>claude</td><td>glm-4.7</td><td>75,752</td><td>19,231</td><td>508,288</td><td>0</td><td>$0.6682</td></tr><tr><td>judge</td><td>claude</td><td>sonnet</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td></tr><tr><td>mutator</td><td>claude</td><td>glm-4.7</td><td>27,644</td><td>9,652</td><td>156,544</td><td>0</td><td>$0.2747</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Provider Invocation Breakdown</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Provider</th><th>Total</th><th>Mutator</th><th>Executor</th><th>Judge</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cost</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>claude</td><td>110</td><td>6</td><td>52</td><td>52</td><td>383,992</td><td>124,665</td><td>14,233,088</td><td>$8.3342</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Run Diagnostics</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Metric</th><th>Value</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>Run started at (UTC)</td><td>2026-03-08T15:39:03.283376+00:00</td></tr> | |
| <tr><td>Run ended at (UTC)</td><td>2026-03-08T17:08:15.750400+00:00</td></tr> | |
| <tr><td>Wall clock duration</td><td>1h 29m 12s</td></tr> | |
| <tr><td>Active runtime (eval + mutator)</td><td>1h 24m 4s</td></tr> | |
| <tr><td>Orchestration overhead</td><td>5m 8s</td></tr> | |
| <tr><td>Eval runtime total</td><td>1h 22m 8s</td></tr> | |
| <tr><td>Mutator runtime total</td><td>1m 56s</td></tr> | |
| <tr><td>Eval tasks with runtime</td><td>52</td></tr> | |
| <tr><td>Mutator calls with duration</td><td>6</td></tr> | |
| <tr><td>Candidate overlays generated</td><td>8</td></tr> | |
| <tr><td>Candidate metadata records</td><td>8</td></tr> | |
| <tr><td>Candidate overlays train-tested</td><td>8</td></tr> | |
| <tr><td>Candidate overlays val/core-tested</td><td>4</td></tr> | |
| <tr><td>Promotion events</td><td>2</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Quality Flags</h2> | |
| <ul><li>missing_cost_usage</li><li>missing_directional_token_usage</li></ul> | |
| </section> | |
| <section> | |
| <h2>Overlay Diff (Initial vs Promoted)</h2> | |
| <div class="sub">Initial: /home/agent/skillpa/reports/opt-chatgpt-20260308-153903/inputs/initial.overlay.md<br/>Promoted: /home/agent/skillpa/reports/opt-chatgpt-20260308-153903/promoted.overlay.md</div> | |
| <div class="sub">Initial lines: 93 | Promoted lines: 94 | +3 / -2</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-chatgpt-20260308-153903/inputs/initial.overlay.md</span> | |
| <span class="diff-file">+++ opt-chatgpt-20260308-153903/promoted.overlay.md</span> | |
| <span class="diff-hunk">@@ -45,7 +45,6 @@</span> | |
| <span class="diff-neutral"> fi</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> printf '%s\n' "$SNAPSHOT"</span> | |
| <span class="diff-del">-steel browser get text main --session "$SESSION"</span> | |
| <span class="diff-neutral"> steel browser stop --session "$SESSION"</span> | |
| <span class="diff-neutral"> ```</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-hunk">@@ -58,7 +57,6 @@</span> | |
| <span class="diff-neutral"> steel browser wait 20000 --session "$SESSION"</span> | |
| <span class="diff-neutral"> SNAPSHOT="$(steel browser snapshot --session "$SESSION")"</span> | |
| <span class="diff-neutral"> printf '%s\n' "$SNAPSHOT"</span> | |
| <span class="diff-del">-steel browser get text main --session "$SESSION"</span> | |
| <span class="diff-neutral"> ```</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Failure policy</span> | |
| <span class="diff-hunk">@@ -75,12 +73,15 @@</span> | |
| <span class="diff-neutral"> Use explicit selector `#prompt-textarea`.</span> | |
| <span class="diff-neutral"> - `Expected string, received null`:</span> | |
| <span class="diff-neutral"> Avoid role queries without explicit names and stable scope.</span> | |
| <span class="diff-add">+- `unknown subcommand`:</span> | |
| <span class="diff-add">+ Verify command syntax matches `steel browser <action> [options]`. Do not use `get text` as separate command.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Do not</span> | |
| <span class="diff-neutral"> - Do not use ephemeral refs from old snapshots (`@eNN`) across new snapshots.</span> | |
| <span class="diff-neutral"> - Do not click ambiguous text labels like `Continue` or `What can I help with?`.</span> | |
| <span class="diff-neutral"> - Do not use unquoted URLs containing `?` or `&`.</span> | |
| <span class="diff-neutral"> - Do not leave a session running.</span> | |
| <span class="diff-add">+- Do not use `steel browser get text main` — `get` is not a valid subcommand.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Overlay path parity</span> | |
| <span class="diff-neutral"> Keep this same overlay text in both paths:</span></pre> | |
| </section> | |
| <section> | |
| <h2>Stepwise Overlay Diffs</h2> | |
| <div class="sub">Attempted changes per iteration, diffed against the current champion at that point in the run.</div> | |
| <details class="journey-step"> | |
| <summary>Iteration 1: selected candidate-00</summary> | |
| <div class="sub">opt-chatgpt-20260308-153903/inputs/initial.overlay.md -> opt-chatgpt-20260308-153903/candidates/iter-1/candidate-00.md</div> | |
| <div class="sub">selected val 0.7408 | improvement 0.0825 | promoted</div> | |
| <div class="sub">lines: 93 -> 93 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Iteration 2: selected candidate-01</summary> | |
| <div class="sub">opt-chatgpt-20260308-153903/candidates/iter-1/candidate-00.md -> opt-chatgpt-20260308-153903/candidates/iter-2/candidate-01.md</div> | |
| <div class="sub">selected val 0.7700 | improvement 0.0292 | promoted</div> | |
| <div class="sub">lines: 93 -> 94 | +3 / -2</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-chatgpt-20260308-153903/candidates/iter-1/candidate-00.md</span> | |
| <span class="diff-file">+++ opt-chatgpt-20260308-153903/candidates/iter-2/candidate-01.md</span> | |
| <span class="diff-hunk">@@ -45,7 +45,6 @@</span> | |
| <span class="diff-neutral"> fi</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> printf '%s\n' "$SNAPSHOT"</span> | |
| <span class="diff-del">-steel browser get text main --session "$SESSION"</span> | |
| <span class="diff-neutral"> steel browser stop --session "$SESSION"</span> | |
| <span class="diff-neutral"> ```</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-hunk">@@ -58,7 +57,6 @@</span> | |
| <span class="diff-neutral"> steel browser wait 20000 --session "$SESSION"</span> | |
| <span class="diff-neutral"> SNAPSHOT="$(steel browser snapshot --session "$SESSION")"</span> | |
| <span class="diff-neutral"> printf '%s\n' "$SNAPSHOT"</span> | |
| <span class="diff-del">-steel browser get text main --session "$SESSION"</span> | |
| <span class="diff-neutral"> ```</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Failure policy</span> | |
| <span class="diff-hunk">@@ -75,12 +73,15 @@</span> | |
| <span class="diff-neutral"> Use explicit selector `#prompt-textarea`.</span> | |
| <span class="diff-neutral"> - `Expected string, received null`:</span> | |
| <span class="diff-neutral"> Avoid role queries without explicit names and stable scope.</span> | |
| <span class="diff-add">+- `unknown subcommand`:</span> | |
| <span class="diff-add">+ Verify command syntax matches `steel browser <action> [options]`. Do not use `get text` as separate command.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Do not</span> | |
| <span class="diff-neutral"> - Do not use ephemeral refs from old snapshots (`@eNN`) across new snapshots.</span> | |
| <span class="diff-neutral"> - Do not click ambiguous text labels like `Continue` or `What can I help with?`.</span> | |
| <span class="diff-neutral"> - Do not use unquoted URLs containing `?` or `&`.</span> | |
| <span class="diff-neutral"> - Do not leave a session running.</span> | |
| <span class="diff-add">+- Do not use `steel browser get text main` — `get` is not a valid subcommand.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Overlay path parity</span> | |
| <span class="diff-neutral"> Keep this same overlay text in both paths:</span></pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Final champion to promoted overlay</summary> | |
| <div class="sub">opt-chatgpt-20260308-153903/candidates/iter-2/candidate-01.md -> opt-chatgpt-20260308-153903/promoted.overlay.md</div> | |
| <div class="sub">artifact diff</div> | |
| <div class="sub">lines: 94 -> 94 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| </section> | |
| <div class="foot">Report schema v1 | Generated: 2026-03-10T01:45:04.462281+00:00</div> | |
| </div> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment