Created
March 10, 2026 22:05
-
-
Save nibzard/0d1590a7c828a5e6bf1b91f5f273a0ca to your computer and use it in GitHub Desktop.
SkillPA optimization report: opt-booking-20260310-152921
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!doctype html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> | |
| <title>Optimization Report - booking</title> | |
| <style> | |
| :root { | |
| --bg: #f6f7f3; | |
| --surface: #ffffff; | |
| --ink: #14212e; | |
| --muted: #5b6673; | |
| --accent: #126b79; | |
| --line: #d6dbe2; | |
| --ok: #2c7a42; | |
| --bad: #9a2e2e; | |
| } | |
| * { box-sizing: border-box; } | |
| body { | |
| margin: 0; | |
| font-family: 'IBM Plex Sans', 'Segoe UI', sans-serif; | |
| color: var(--ink); | |
| background: linear-gradient(180deg, #f8f9f6 0%, #edf2ee 100%); | |
| } | |
| .wrap { max-width: 1180px; margin: 0 auto; padding: 24px; } | |
| h1 { margin: 0 0 8px; font-size: 30px; } | |
| .sub { color: var(--muted); margin-bottom: 18px; } | |
| .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 10px; margin-bottom: 18px; } | |
| .card { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 12px; } | |
| .k { color: var(--muted); font-size: 12px; text-transform: uppercase; letter-spacing: .03em; } | |
| .v { font-size: 24px; font-weight: 650; margin-top: 4px; } | |
| section { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 14px; margin-bottom: 14px; overflow-x: auto; } | |
| h3 { margin: 0 0 6px; font-size: 15px; } | |
| h2 { margin: 0 0 10px; font-size: 18px; } | |
| table { width: 100%; border-collapse: collapse; font-size: 13px; } | |
| th, td { border-bottom: 1px solid var(--line); padding: 8px 6px; text-align: left; vertical-align: top; } | |
| th { color: var(--muted); font-weight: 600; } | |
| ul { margin: 0; padding-left: 18px; } | |
| .lineage { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 10px; } | |
| .lineage-node { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: linear-gradient(180deg, #ffffff 0%, #f6faf8 100%); } | |
| .lineage-iteration { background: linear-gradient(180deg, #ffffff 0%, #f7fbff 100%); } | |
| .lineage-promoted { background: linear-gradient(180deg, #ffffff 0%, #f6fcf6 100%); } | |
| .lineage-title { font-weight: 700; margin-bottom: 2px; } | |
| .lineage-sub, .lineage-path, .lineage-state, .lineage-detail { font-size: 12px; color: var(--muted); margin-top: 4px; } | |
| .iteration-block { margin-top: 14px; } | |
| .iteration-head { margin-bottom: 8px; } | |
| .candidate-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 10px; } | |
| .candidate-card { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: #fcfcfa; } | |
| .candidate-head { display: flex; justify-content: space-between; gap: 10px; align-items: flex-start; margin-bottom: 8px; } | |
| .candidate-title { font-weight: 700; } | |
| .badge-row { display: flex; flex-wrap: wrap; gap: 6px; justify-content: flex-end; } | |
| .badge { border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; font-size: 11px; color: var(--muted); background: #f5f7f9; } | |
| .badge-mode { background: #eef7fb; color: #25556f; } | |
| .badge-validated { background: #eef7ef; color: #2b6540; } | |
| .badge-selected { background: #fff4dc; color: #8b5a00; } | |
| .badge-promoted { background: #e8f7e9; color: #21643a; } | |
| .badge-missing { background: #fdecec; color: #8e3434; } | |
| .candidate-metrics { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 6px; margin-bottom: 8px; } | |
| .metric-k { display: block; font-size: 11px; color: var(--muted); text-transform: uppercase; } | |
| .metric-v { display: block; font-size: 15px; font-weight: 650; margin-top: 2px; } | |
| .candidate-detail { font-size: 12px; color: var(--muted); margin-top: 4px; } | |
| .journey-step { border: 1px solid var(--line); border-radius: 10px; padding: 10px 12px; background: #fafcfb; margin-bottom: 10px; } | |
| .journey-step summary { cursor: pointer; font-weight: 650; } | |
| .grid-line { stroke: #e6ebf0; stroke-width: 1; } | |
| .axis-line { stroke: #8b97a5; stroke-width: 1; } | |
| .axis-label { fill: #647182; font-size: 11px; } | |
| .score-line { stroke: var(--accent); stroke-width: 2.3; } | |
| .line-point { fill: #126b79; } | |
| .cost-bar { fill: #2d6b9f; opacity: .82; } | |
| .diff-file { color: #9ec1ff; display: block; } | |
| .diff-hunk { color: #ffd479; display: block; } | |
| .diff-add { color: #65d28d; background: #123321; display: block; } | |
| .diff-del { color: #ff9da1; background: #3b1818; display: block; } | |
| .diff-neutral { color: #d9e4ef; display: block; } | |
| .foot { color: var(--muted); font-size: 12px; margin-top: 16px; } | |
| </style> | |
| </head> | |
| <body> | |
| <div class="wrap"> | |
| <h1>Optimization Report</h1> | |
| <div class="sub">Domain: <strong>booking</strong> | Run generated: 2026-03-10T21:44:06.229944+00:00</div> | |
| <div class="cards"> | |
| <div class="card"><div class="k">Baseline score</div><div class="v">0.5850</div></div> | |
| <div class="card"><div class="k">Final test score</div><div class="v">0.8700</div></div> | |
| <div class="card"><div class="k">Delta</div><div class="v">0.2850</div></div> | |
| <div class="card"><div class="k">Total cost</div><div class="v">$91.3182</div></div> | |
| <div class="card"><div class="k">Wall clock</div><div class="v">6h 14m 45s</div></div> | |
| <div class="card"><div class="k">Total invocations</div><div class="v">314</div></div> | |
| <div class="card"><div class="k">Input tokens</div><div class="v">3,724,902</div></div> | |
| <div class="card"><div class="k">Output tokens</div><div class="v">715,516</div></div> | |
| <div class="card"><div class="k">Cache read tokens</div><div class="v">167,222,656</div></div> | |
| <div class="card"><div class="k">Total tokens (incl cache)</div><div class="v">171,663,074</div></div> | |
| <div class="card"><div class="k">Input from cache</div><div class="v">97.82%</div></div> | |
| </div> | |
| <section> | |
| <h2>Score Over Iterations</h2> | |
| <svg width="760" height="240" role="img" aria-label="Score trend chart"><line x1="46" y1="20.0" x2="742" y2="20.0" class="grid-line" /><line x1="46" y1="65.5" x2="742" y2="65.5" class="grid-line" /><line x1="46" y1="111.0" x2="742" y2="111.0" class="grid-line" /><line x1="46" y1="156.5" x2="742" y2="156.5" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="axis-line" /><line x1="46" y1="20" x2="46" y2="202.0" class="axis-line" /><polyline fill="none" points="46.0,202.0 162.0,46.2 278.0,90.5 394.0,77.9 510.0,103.0 626.0,20.0 742.0,37.6" class="score-line" /><circle cx="46.0" cy="202.0" r="3.5" class="line-point"><title>baseline-val: 0.5850</title></circle><circle cx="162.0" cy="46.2" r="3.5" class="line-point"><title>iter-1-selected-val: 0.8550</title></circle><circle cx="278.0" cy="90.5" r="3.5" class="line-point"><title>iter-2-selected-val: 0.7783</title></circle><circle cx="394.0" cy="77.9" r="3.5" class="line-point"><title>iter-3-selected-val: 0.8002</title></circle><circle cx="510.0" cy="103.0" r="3.5" class="line-point"><title>iter-4-selected-val: 0.7566</title></circle><circle cx="626.0" cy="20.0" r="3.5" class="line-point"><title>iter-5-selected-val: 0.9005</title></circle><circle cx="742.0" cy="37.6" r="3.5" class="line-point"><title>final-test: 0.8700</title></circle><text x="40" y="24.0" text-anchor="end" class="axis-label">0.90</text><text x="40" y="69.5" text-anchor="end" class="axis-label">0.82</text><text x="40" y="115.0" text-anchor="end" class="axis-label">0.74</text><text x="40" y="160.5" text-anchor="end" class="axis-label">0.66</text><text x="40" y="206.0" text-anchor="end" class="axis-label">0.58</text><text x="46.0" y="228" text-anchor="middle" class="axis-label">baseline-val</text><text x="162.0" y="228" text-anchor="middle" class="axis-label">iter-1-selected-val</text><text x="278.0" y="228" text-anchor="middle" class="axis-label">iter-2-selected-val</text><text x="394.0" y="228" text-anchor="middle" class="axis-label">iter-3-selected-val</text><text x="510.0" y="228" text-anchor="middle" class="axis-label">iter-4-selected-val</text><text x="626.0" y="228" text-anchor="middle" class="axis-label">iter-5-selected-val</text><text x="742.0" y="228" text-anchor="middle" class="axis-label">final-test</text></svg> | |
| </section> | |
| <section> | |
| <h2>Cost Over Phases (USD)</h2> | |
| <svg width="760" height="220" role="img" aria-label="Cost chart"><line x1="46" y1="174.0" x2="742" y2="174.0" class="axis-line" /><rect x="47.2" y="130.4" width="12.0" height="43.6" class="cost-bar"><title>baseline-val: 1.6705</title></rect><rect x="61.8" y="173.3" width="12.0" height="0.7" class="cost-bar"><title>baseline-core: 0.0262</title></rect><rect x="76.2" y="172.5" width="12.0" height="1.5" class="cost-bar"><title>iter-1-mutation: 0.0557</title></rect><rect x="90.8" y="97.8" width="12.0" height="76.2" class="cost-bar"><title>iter-1-train-candidate-00: 2.9211</title></rect><rect x="105.2" y="160.7" width="12.0" height="13.3" class="cost-bar"><title>iter-1-train-candidate-01: 0.5091</title></rect><rect x="119.8" y="76.8" width="12.0" height="97.2" class="cost-bar"><title>iter-1-train-candidate-02: 3.7284</title></rect><rect x="134.2" y="122.3" width="12.0" height="51.7" class="cost-bar"><title>iter-1-train-candidate-03: 1.9807</title></rect><rect x="148.8" y="76.6" width="12.0" height="97.4" class="cost-bar"><title>iter-1-val-candidate-00: 3.7353</title></rect><rect x="163.2" y="76.4" width="12.0" height="97.6" class="cost-bar"><title>iter-1-val-candidate-01: 3.7440</title></rect><rect x="177.8" y="173.3" width="12.0" height="0.7" class="cost-bar"><title>iter-1-core-candidate-00: 0.0276</title></rect><rect x="192.2" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-1-core-candidate-01: 0.0813</title></rect><rect x="206.8" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-2-mutation: 0.0811</title></rect><rect x="221.2" y="121.2" width="12.0" height="52.8" class="cost-bar"><title>iter-2-train-candidate-00: 2.0257</title></rect><rect x="235.8" y="103.3" width="12.0" height="70.7" class="cost-bar"><title>iter-2-train-candidate-01: 2.7098</title></rect><rect x="250.2" y="129.9" width="12.0" height="44.1" class="cost-bar"><title>iter-2-train-candidate-02: 1.6896</title></rect><rect x="264.8" y="113.7" width="12.0" height="60.3" class="cost-bar"><title>iter-2-train-candidate-03: 2.3128</title></rect><rect x="279.2" y="96.0" width="12.0" height="78.0" class="cost-bar"><title>iter-2-val-candidate-00: 2.9921</title></rect><rect x="293.8" y="107.6" width="12.0" height="66.4" class="cost-bar"><title>iter-2-val-candidate-01: 2.5477</title></rect><rect x="308.2" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-2-core-candidate-00: 0.0817</title></rect><rect x="322.8" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-2-core-candidate-01: 0.0798</title></rect><rect x="337.2" y="171.8" width="12.0" height="2.2" class="cost-bar"><title>iter-3-mutation: 0.0836</title></rect><rect x="351.8" y="103.2" width="12.0" height="70.8" class="cost-bar"><title>iter-3-train-candidate-00: 2.7148</title></rect><rect x="366.2" y="124.9" width="12.0" height="49.1" class="cost-bar"><title>iter-3-train-candidate-01: 1.8820</title></rect><rect x="380.8" y="110.3" width="12.0" height="63.7" class="cost-bar"><title>iter-3-train-candidate-02: 2.4433</title></rect><rect x="395.2" y="109.1" width="12.0" height="64.9" class="cost-bar"><title>iter-3-train-candidate-03: 2.4877</title></rect><rect x="409.8" y="72.1" width="12.0" height="101.9" class="cost-bar"><title>iter-3-val-candidate-00: 3.9088</title></rect><rect x="424.2" y="49.3" width="12.0" height="124.7" class="cost-bar"><title>iter-3-val-candidate-03: 4.7799</title></rect><rect x="438.8" y="172.9" width="12.0" height="1.1" class="cost-bar"><title>iter-3-core-candidate-00: 0.0414</title></rect><rect x="453.2" y="172.0" width="12.0" height="2.0" class="cost-bar"><title>iter-3-core-candidate-03: 0.0766</title></rect><rect x="467.8" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-4-mutation: 0.0806</title></rect><rect x="482.2" y="127.5" width="12.0" height="46.5" class="cost-bar"><title>iter-4-train-candidate-00: 1.7849</title></rect><rect x="496.8" y="106.0" width="12.0" height="68.0" class="cost-bar"><title>iter-4-train-candidate-01: 2.6091</title></rect><rect x="511.2" y="125.6" width="12.0" height="48.4" class="cost-bar"><title>iter-4-train-candidate-02: 1.8548</title></rect><rect x="525.8" y="115.3" width="12.0" height="58.7" class="cost-bar"><title>iter-4-train-candidate-03: 2.2514</title></rect><rect x="540.2" y="63.4" width="12.0" height="110.6" class="cost-bar"><title>iter-4-val-candidate-00: 4.2398</title></rect><rect x="554.8" y="65.4" width="12.0" height="108.6" class="cost-bar"><title>iter-4-val-candidate-03: 4.1638</title></rect><rect x="569.2" y="173.2" width="12.0" height="0.8" class="cost-bar"><title>iter-4-core-candidate-00: 0.0312</title></rect><rect x="583.8" y="173.3" width="12.0" height="0.7" class="cost-bar"><title>iter-4-core-candidate-03: 0.0275</title></rect><rect x="598.2" y="171.7" width="12.0" height="2.3" class="cost-bar"><title>iter-5-mutation: 0.0901</title></rect><rect x="612.8" y="118.2" width="12.0" height="55.8" class="cost-bar"><title>iter-5-train-candidate-00: 2.1393</title></rect><rect x="627.2" y="94.3" width="12.0" height="79.7" class="cost-bar"><title>iter-5-train-candidate-01: 3.0575</title></rect><rect x="641.8" y="91.7" width="12.0" height="82.3" class="cost-bar"><title>iter-5-train-candidate-02: 3.1568</title></rect><rect x="656.2" y="135.7" width="12.0" height="38.3" class="cost-bar"><title>iter-5-train-candidate-03: 1.4685</title></rect><rect x="670.8" y="16.0" width="12.0" height="158.0" class="cost-bar"><title>iter-5-val-candidate-00: 6.0584</title></rect><rect x="685.2" y="20.0" width="12.0" height="154.0" class="cost-bar"><title>iter-5-val-candidate-02: 5.9045</title></rect><rect x="699.8" y="173.4" width="12.0" height="0.6" class="cost-bar"><title>iter-5-core-candidate-00: 0.0232</title></rect><rect x="714.2" y="173.2" width="12.0" height="0.8" class="cost-bar"><title>iter-5-core-candidate-02: 0.0292</title></rect><rect x="728.8" y="149.8" width="12.0" height="24.2" class="cost-bar"><title>final-test: 0.9295</title></rect><text x="53.2" y="204" text-anchor="middle" class="axis-label">baseline-val</text><text x="67.8" y="204" text-anchor="middle" class="axis-label">baseline-core</text><text x="82.2" y="204" text-anchor="middle" class="axis-label">iter-1-mutation</text><text x="96.8" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-00</text><text x="111.2" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-01</text><text x="125.8" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-02</text><text x="140.2" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-03</text><text x="154.8" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-00</text><text x="169.2" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-01</text><text x="183.8" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-00</text><text x="198.2" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-01</text><text x="212.8" y="204" text-anchor="middle" class="axis-label">iter-2-mutation</text><text x="227.2" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-00</text><text x="241.8" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-01</text><text x="256.2" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-02</text><text x="270.8" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-03</text><text x="285.2" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-00</text><text x="299.8" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-01</text><text x="314.2" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-00</text><text x="328.8" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-01</text><text x="343.2" y="204" text-anchor="middle" class="axis-label">iter-3-mutation</text><text x="357.8" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-00</text><text x="372.2" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-01</text><text x="386.8" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-02</text><text x="401.2" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-03</text><text x="415.8" y="204" text-anchor="middle" class="axis-label">iter-3-val-candidate-00</text><text x="430.2" y="204" text-anchor="middle" class="axis-label">iter-3-val-candidate-03</text><text x="444.8" y="204" text-anchor="middle" class="axis-label">iter-3-core-candidate-00</text><text x="459.2" y="204" text-anchor="middle" class="axis-label">iter-3-core-candidate-03</text><text x="473.8" y="204" text-anchor="middle" class="axis-label">iter-4-mutation</text><text x="488.2" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-00</text><text x="502.8" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-01</text><text x="517.2" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-02</text><text x="531.8" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-03</text><text x="546.2" y="204" text-anchor="middle" class="axis-label">iter-4-val-candidate-00</text><text x="560.8" y="204" text-anchor="middle" class="axis-label">iter-4-val-candidate-03</text><text x="575.2" y="204" text-anchor="middle" class="axis-label">iter-4-core-candidate-00</text><text x="589.8" y="204" text-anchor="middle" class="axis-label">iter-4-core-candidate-03</text><text x="604.2" y="204" text-anchor="middle" class="axis-label">iter-5-mutation</text><text x="618.8" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-00</text><text x="633.2" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-01</text><text x="647.8" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-02</text><text x="662.2" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-03</text><text x="676.8" y="204" text-anchor="middle" class="axis-label">iter-5-val-candidate-00</text><text x="691.2" y="204" text-anchor="middle" class="axis-label">iter-5-val-candidate-02</text><text x="705.8" y="204" text-anchor="middle" class="axis-label">iter-5-core-candidate-00</text><text x="720.2" y="204" text-anchor="middle" class="axis-label">iter-5-core-candidate-02</text><text x="734.8" y="204" text-anchor="middle" class="axis-label">final-test</text></svg> | |
| </section> | |
| <section> | |
| <h2>Overlay Lineage</h2> | |
| <div class="sub">How the champion evolved across iterations, including held-champion steps when gates rejected the selected candidate.</div> | |
| <div class="lineage"> | |
| <div class="lineage-node lineage-baseline"> | |
| <div class="lineage-title">Initial overlay</div> | |
| <div class="lineage-sub">starting champion</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 1</div> | |
| <div class="lineage-sub">selected candidate-01</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-01</div><div class="lineage-detail">selected val: 0.8550</div><div class="lineage-detail">improvement: 0.2346</div><div class="lineage-detail">result: held previous champion</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 2</div> | |
| <div class="lineage-sub">selected candidate-00</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">selected val: 0.7783</div><div class="lineage-detail">improvement: 0.0000</div><div class="lineage-detail">result: held previous champion</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 3</div> | |
| <div class="lineage-sub">selected candidate-00</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">selected val: 0.8002</div><div class="lineage-detail">improvement: 0.0000</div><div class="lineage-detail">result: held previous champion</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 4</div> | |
| <div class="lineage-sub">selected candidate-00</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">selected val: 0.7566</div><div class="lineage-detail">improvement: 0.0000</div><div class="lineage-detail">result: held previous champion</div> | |
| </div> | |
| <div class="lineage-node lineage-iteration"> | |
| <div class="lineage-title">Iteration 5</div> | |
| <div class="lineage-sub">selected candidate-02</div> | |
| <div class="lineage-path">inputs/initial.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| <div class="lineage-detail">selected: candidate-02</div><div class="lineage-detail">selected val: 0.9005</div><div class="lineage-detail">improvement: 0.0425</div><div class="lineage-detail">result: held previous champion</div> | |
| </div> | |
| <div class="lineage-node lineage-promoted"> | |
| <div class="lineage-title">Promoted overlay</div> | |
| <div class="lineage-sub">final published artifact</div> | |
| <div class="lineage-path">promoted.overlay.md</div> | |
| <div class="lineage-state">present</div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Iteration Candidates</h2> | |
| <div class="sub">Per-iteration view of the explored candidate overlays, their scores, and whether they survived validation and gates.</div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 1</h3> | |
| <div class="sub">selected candidate: 1 | result: held previous champion</div> | |
| <div class="sub">core_regression_gate_failed: candidate_core=0.8900, baseline_core=0.9550, tolerance=0.0200</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7598</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.6204</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.8662</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 4 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.9542</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.8550</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.8900</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 5 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.5209</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">fallback_copy</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.5197</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-1/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| </div> | |
| </div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 2</h3> | |
| <div class="sub">selected candidate: 0 | result: held previous champion</div> | |
| <div class="sub">val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2; core_regression_gate_failed: candidate_core=0.9100, baseline_core=0.9550, tolerance=0.0200</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6761</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.7783</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 5 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7525</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.5783</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 3 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6400</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6739</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-2/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| </div> | |
| </div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 3</h3> | |
| <div class="sub">selected candidate: 0 | result: held previous champion</div> | |
| <div class="sub">val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">1.0000</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.8002</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-3/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 5 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.4933</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-3/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 1 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6367</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-3/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.9153</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.6200</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-3/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 4 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| </div> | |
| </div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 4</h3> | |
| <div class="sub">selected candidate: 0 | result: held previous champion</div> | |
| <div class="sub">val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_hard_fail_count_nonzero; validation_repeat_gate_failed: win_count=0, threshold=2</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.5908</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.7566</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.9350</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-4/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val 5 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7450</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-4/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.5908</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-4/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.8592</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.7270</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.8900</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-4/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 6 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| </div> | |
| </div> | |
| <div class="iteration-block"> | |
| <div class="iteration-head"> | |
| <h3>Iteration 5</h3> | |
| <div class="sub">selected candidate: 2 | result: held previous champion</div> | |
| <div class="sub">core_regression_gate_failed: candidate_core=0.8808, baseline_core=0.9550, tolerance=0.0200</div> | |
| </div> | |
| <div class="candidate-grid"> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-00</div> | |
| <div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.8867</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.8579</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">1.0000</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-5/candidate-00.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 6 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-01</div> | |
| <div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.7542</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-5/candidate-01.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-02</div> | |
| <div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.8737</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">0.9005</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">0.8808</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-5/candidate-02.md</div> | |
| <div class="candidate-detail">exec counts: train 3 | val 6 | core 2</div> | |
| <div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div> | |
| </div> | |
| <div class="candidate-card"> | |
| <div class="candidate-head"> | |
| <div class="candidate-title">candidate-03</div> | |
| <div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div> | |
| </div> | |
| <div class="candidate-metrics"> | |
| <div><span class="metric-k">Train</span><span class="metric-v">0.6833</span></div> | |
| <div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div> | |
| <div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div> | |
| </div> | |
| <div class="candidate-detail">overlay: candidates/iter-5/candidate-03.md</div> | |
| <div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div> | |
| <div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div> | |
| </div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Promotion Decisions</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Iteration</th><th>Selected candidate</th><th>Selected val score</th><th>Improvement</th><th>Promoted</th><th>Reasons</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>1</td><td>1</td><td>0.8550</td><td>0.2346</td><td>no</td><td>core_regression_gate_failed: candidate_core=0.8900, baseline_core=0.9550, tolerance=0.0200</td></tr><tr><td>2</td><td>0</td><td>0.7783</td><td>0.0000</td><td>no</td><td>val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2; core_regression_gate_failed: candidate_core=0.9100, baseline_core=0.9550, tolerance=0.0200</td></tr><tr><td>3</td><td>0</td><td>0.8002</td><td>0.0000</td><td>no</td><td>val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2</td></tr><tr><td>4</td><td>0</td><td>0.7566</td><td>0.0000</td><td>no</td><td>val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_hard_fail_count_nonzero; validation_repeat_gate_failed: win_count=0, threshold=2</td></tr><tr><td>5</td><td>2</td><td>0.9005</td><td>0.0425</td><td>no</td><td>core_regression_gate_failed: candidate_core=0.8808, baseline_core=0.9550, tolerance=0.0200</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Token and Cost Breakdown</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Role</th><th>Provider</th><th>Model</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cache create</th><th>Cost</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>executor</td><td>claude</td><td>glm-4.5-air</td><td>5,517,828</td><td>146,840</td><td>1,626,342</td><td>0</td><td>$19.2440</td></tr><tr><td>executor</td><td>claude</td><td>glm-4.7</td><td>3,568,119</td><td>631,729</td><td>165,255,424</td><td>0</td><td>$69.7569</td></tr><tr><td>judge</td><td>claude</td><td>glm-4.7</td><td>146,566</td><td>64,830</td><td>1,713,408</td><td>0</td><td>$1.9262</td></tr><tr><td>judge</td><td>claude</td><td>sonnet</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td></tr><tr><td>mutator</td><td>claude</td><td>glm-4.7</td><td>10,217</td><td>18,957</td><td>253,824</td><td>0</td><td>$0.3912</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Provider Invocation Breakdown</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Provider</th><th>Total</th><th>Mutator</th><th>Executor</th><th>Judge</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cost</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>claude</td><td>314</td><td>14</td><td>150</td><td>150</td><td>3,724,902</td><td>715,516</td><td>167,222,656</td><td>$91.3182</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Run Diagnostics</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Metric</th><th>Value</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>Run started at (UTC)</td><td>2026-03-10T15:29:21.555323+00:00</td></tr> | |
| <tr><td>Run ended at (UTC)</td><td>2026-03-10T21:44:06.229944+00:00</td></tr> | |
| <tr><td>Wall clock duration</td><td>6h 14m 45s</td></tr> | |
| <tr><td>Active runtime (eval + mutator)</td><td>10h 0m 15s</td></tr> | |
| <tr><td>Orchestration overhead</td><td>0s</td></tr> | |
| <tr><td>Eval runtime total</td><td>9h 57m 3s</td></tr> | |
| <tr><td>Mutator runtime total</td><td>3m 12s</td></tr> | |
| <tr><td>Eval tasks with runtime</td><td>150</td></tr> | |
| <tr><td>Mutator calls with duration</td><td>14</td></tr> | |
| <tr><td>Candidate overlays generated</td><td>20</td></tr> | |
| <tr><td>Candidate metadata records</td><td>20</td></tr> | |
| <tr><td>Candidate overlays train-tested</td><td>20</td></tr> | |
| <tr><td>Candidate overlays val/core-tested</td><td>10</td></tr> | |
| <tr><td>Promotion events</td><td>0</td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <section> | |
| <h2>Quality Flags</h2> | |
| <ul><li>missing_cost_usage</li><li>missing_directional_token_usage</li></ul> | |
| </section> | |
| <section> | |
| <h2>Overlay Diff (Initial vs Promoted)</h2> | |
| <div class="sub">Initial: /home/agent/skillpa/reports/opt-booking-20260310-152921/inputs/initial.overlay.md<br/>Promoted: /home/agent/skillpa/reports/opt-booking-20260310-152921/promoted.overlay.md</div> | |
| <div class="sub">Initial lines: 41 | Promoted lines: 41 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </section> | |
| <section> | |
| <h2>Stepwise Overlay Diffs</h2> | |
| <div class="sub">Attempted changes per iteration, diffed against the current champion at that point in the run.</div> | |
| <details class="journey-step"> | |
| <summary>Iteration 1: selected candidate-01</summary> | |
| <div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-1/candidate-01.md</div> | |
| <div class="sub">selected val 0.8550 | improvement 0.2346 | not promoted | core_regression_gate_failed: candidate_core=0.8900, baseline_core=0.9550, tolerance=0.0200</div> | |
| <div class="sub">lines: 41 -> 69 | +38 / -10</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260310-152921/inputs/initial.overlay.md</span> | |
| <span class="diff-file">+++ opt-booking-20260310-152921/candidates/iter-1/candidate-01.md</span> | |
| <span class="diff-hunk">@@ -6,6 +6,14 @@</span> | |
| <span class="diff-neutral"> - Return concrete hotel options that satisfy user constraints.</span> | |
| <span class="diff-neutral"> - Preserve traceability: include constraints, chosen options, and final URL.</span> | |
| <span class="diff-neutral"> - Keep command sequences executable under `steel browser` syntax.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Execution contract</span> | |
| <span class="diff-add">+All sessions MUST follow this lifecycle:</span> | |
| <span class="diff-add">+1. `steel browser start --session <name>` - initialize session</span> | |
| <span class="diff-add">+2. Execute search and filter commands using `--session <name>` consistently</span> | |
| <span class="diff-add">+3. Extract results and capture final state</span> | |
| <span class="diff-add">+4. `steel browser stop --session <name>` - ALWAYS call in both success and failure paths</span> | |
| <span class="diff-add">+5. Never leave zombie sessions; use explicit cleanup on error</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Query shaping</span> | |
| <span class="diff-neutral"> 1. Build searches using destination + check-in/check-out + adults/rooms first.</span> | |
| <span class="diff-hunk">@@ -25,17 +33,37 @@</span> | |
| <span class="diff-neutral"> - If exact numbers are missing, report as unavailable instead of hallucinating.</span> | |
| <span class="diff-neutral"> - Return at least the minimum option count requested by the user.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-del">-## Reliability tactics</span> | |
| <span class="diff-del">-- After opening search results, use `snapshot -i` before selecting elements.</span> | |
| <span class="diff-del">-- Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span> | |
| <span class="diff-del">-- Use `wait --load networkidle` or `wait --text` for synchronization.</span> | |
| <span class="diff-del">-- Always capture final URL via `steel browser get url`.</span> | |
| <span class="diff-del">-- Use `fill` with the positional value argument, e.g. `steel browser fill e16 "Rome" --session booking-search`.</span> | |
| <span class="diff-del">-- Do not use deprecated `--value`; use value as the second positional argument.</span> | |
| <span class="diff-del">-- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span> | |
| <span class="diff-del">-- Always call `steel browser stop` in success and failure paths.</span> | |
| <span class="diff-add">+## Valid steel-browser patterns</span> | |
| <span class="diff-add">+- Session lifecycle: `steel browser start --session s1` / `steel browser stop --session s1`</span> | |
| <span class="diff-add">+- Navigation: `steel browser goto "<url>" --session s1`</span> | |
| <span class="diff-add">+- Screenshots: `steel browser screenshot --session s1 --file out.png`</span> | |
| <span class="diff-add">+- Fills: `steel browser fill @e16 "Rome" --session s1` (value as 2nd positional arg)</span> | |
| <span class="diff-add">+- Clicks: `steel browser click @e5 --session s1`</span> | |
| <span class="diff-add">+- Snapshots: `steel browser snapshot --session s1`</span> | |
| <span class="diff-add">+- Waits: `steel browser wait --load networkidle --session s1` or `--text "Search"</span> | |
| <span class="diff-add">+- URL capture: `steel browser get url --session s1`</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Failure policy</span> | |
| <span class="diff-add">+- On any failure: immediately call `steel browser stop --session <name>`</span> | |
| <span class="diff-add">+- Capture error state with screenshot before cleanup if possible</span> | |
| <span class="diff-add">+- Report specific failure point (search load, filter apply, extraction)</span> | |
| <span class="diff-add">+- If search returns no results: verify destination spelling, try broader search</span> | |
| <span class="diff-add">+- If filters fail: apply incrementally, fallback to manual result extraction</span> | |
| <span class="diff-add">+- If session hangs: use `--timeout` on wait commands, then force stop</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Known error map</span> | |
| <span class="diff-add">+| Invalid Pattern | Correct Pattern |</span> | |
| <span class="diff-add">+|----------------|----------------|</span> | |
| <span class="diff-add">+| `steel screenshot` (no session) | `steel browser screenshot --session s1`</span> | |
| <span class="diff-add">+| `fill --value "text"` (deprecated flag) | `fill @e5 "text"` (positional) |</span> | |
| <span class="diff-add">+| `steel goto` (missing browser) | `steel browser goto "url" --session s1`</span> | |
| <span class="diff-add">+| Missing `--session` on any command | Always include `--session <name>` |</span> | |
| <span class="diff-add">+| Forgetting `stop` on error paths | Always cleanup in success AND failure |</span> | |
| <span class="diff-add">+| Combining 5+ filters at once | Apply 2-3 filters, verify, continue |</span> | |
| <span class="diff-add">+| CSS selectors without snapshot | Use `snapshot` then `@e1` refs |</span> | |
| <span class="diff-add">+| Assuming page ready instantly | Use `wait --load networkidle` or `--text` |</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Output discipline</span> | |
| <span class="diff-del">-- Commands must include session lifecycle (`start` and `stop`).</span> | |
| <span class="diff-add">+- Commands must include session lifecycle (start and stop).</span> | |
| <span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span> | |
| <span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span></pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Iteration 2: selected candidate-00</summary> | |
| <div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-2/candidate-00.md</div> | |
| <div class="sub">selected val 0.7783 | improvement 0.0000 | not promoted | val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2; core_regression_gate_failed: candidate_core=0.9100, baseline_core=0.9550, tolerance=0.0200</div> | |
| <div class="sub">lines: 41 -> 41 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Iteration 3: selected candidate-00</summary> | |
| <div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-3/candidate-00.md</div> | |
| <div class="sub">selected val 0.8002 | improvement 0.0000 | not promoted | val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2</div> | |
| <div class="sub">lines: 41 -> 41 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Iteration 4: selected candidate-00</summary> | |
| <div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-4/candidate-00.md</div> | |
| <div class="sub">selected val 0.7566 | improvement 0.0000 | not promoted | val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_hard_fail_count_nonzero; validation_repeat_gate_failed: win_count=0, threshold=2</div> | |
| <div class="sub">lines: 41 -> 41 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Iteration 5: selected candidate-02</summary> | |
| <div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-5/candidate-02.md</div> | |
| <div class="sub">selected val 0.9005 | improvement 0.0425 | not promoted | core_regression_gate_failed: candidate_core=0.8808, baseline_core=0.9550, tolerance=0.0200</div> | |
| <div class="sub">lines: 41 -> 85 | +64 / -20</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260310-152921/inputs/initial.overlay.md</span> | |
| <span class="diff-file">+++ opt-booking-20260310-152921/candidates/iter-5/candidate-02.md</span> | |
| <span class="diff-hunk">@@ -7,35 +7,79 @@</span> | |
| <span class="diff-neutral"> - Preserve traceability: include constraints, chosen options, and final URL.</span> | |
| <span class="diff-neutral"> - Keep command sequences executable under `steel browser` syntax.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-add">+## Execution contract (REQUIRED)</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+Every booking workflow MUST follow this lifecycle:</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+```</span> | |
| <span class="diff-add">+1. steel browser start --session <session-name></span> | |
| <span class="diff-add">+2. [navigation, fill, wait, snapshot, extract commands]</span> | |
| <span class="diff-add">+3. steel browser get url --session <session-name></span> | |
| <span class="diff-add">+4. steel browser stop --session <session-name></span> | |
| <span class="diff-add">+```</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+**Critical rules:**</span> | |
| <span class="diff-add">+- ALWAYS start with `steel browser start --session <name>`</span> | |
| <span class="diff-add">+- ALWAYS end with `steel browser stop --session <name>` in both success and failure paths</span> | |
| <span class="diff-add">+- Include `--session <name>` in EVERY steel browser command</span> | |
| <span class="diff-add">+- Capture final URL before stopping the session</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Failure policy</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+If extraction or filtering fails:</span> | |
| <span class="diff-add">+1. Attempt recovery with retry (max 2 retries) using `wait --load networkidle`</span> | |
| <span class="diff-add">+2. If unrecoverable, still call `steel browser stop --session <name>` to clean up</span> | |
| <span class="diff-add">+3. Report partial results with explicit gap description: "Could not extract [field]; marked as unavailable"</span> | |
| <span class="diff-add">+4. Never omit `stop` command—session cleanup is mandatory even on error</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+## Known error map / invalid-command prevention</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+| Error pattern | Correct approach | Why |</span> | |
| <span class="diff-add">+|--------------|------------------|-----|</span> | |
| <span class="diff-add">+| `fill e1 --value "text"` | `fill e1 "text"` | `--value` flag is deprecated; use positional value |</span> | |
| <span class="diff-add">+| `steel screenshot` | `steel browser screenshot --session X` | Must use browser subcommand with session |</span> | |
| <span class="diff-add">+| `click @e1` | `click @e1 --session X` | All commands require `--session` flag |</span> | |
| <span class="diff-add">+| `fill e16 "Rome"` | `steel browser fill e16 "Rome" --session X` | Must prefix with `steel browser` |</span> | |
| <span class="diff-add">+| Missing `stop` command | Always append `steel browser stop --session X` | Prevents session leaks |</span> | |
| <span class="diff-add">+| CSS selectors like `#button` | Use snapshot refs like `@e1` after `snapshot -i` | More reliable after dynamic rendering |</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-neutral"> ## Query shaping</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-neutral"> 1. Build searches using destination + check-in/check-out + adults/rooms first.</span> | |
| <span class="diff-del">-2. Prefer URL query parameters for deterministic bootstrapping when possible.</span> | |
| <span class="diff-add">+2. Prefer URL query parameters for deterministic bootstrapping when possible:</span> | |
| <span class="diff-add">+ - `ss=destination&checkin=YYYY-MM-DD&checkout=YYYY-MM-DD&group_adults=N`</span> | |
| <span class="diff-neutral"> 3. Apply filters incrementally and snapshot after each major change.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-del">-## Filter strategy</span> | |
| <span class="diff-del">-- Prioritize these filters in order when relevant:</span> | |
| <span class="diff-del">- 1. Price ceiling/range</span> | |
| <span class="diff-del">- 2. Guest count and stay duration</span> | |
| <span class="diff-del">- 3. Review score / star rating</span> | |
| <span class="diff-del">- 4. Property type and distance preferences</span> | |
| <span class="diff-del">-- Avoid combining too many filters in one step; verify list updates between steps.</span> | |
| <span class="diff-add">+## Filter strategy (ordered priority)</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+1. Price ceiling/range</span> | |
| <span class="diff-add">+2. Guest count and stay duration </span> | |
| <span class="diff-add">+3. Review score / star rating</span> | |
| <span class="diff-add">+4. Property type and distance preferences</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+Avoid combining too many filters in one step; verify list updates between steps.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Extraction strategy</span> | |
| <span class="diff-del">-- Extract at least: property name, nightly/total price, rating/review count if visible.</span> | |
| <span class="diff-del">-- If exact numbers are missing, report as unavailable instead of hallucinating.</span> | |
| <span class="diff-del">-- Return at least the minimum option count requested by the user.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+Extract at minimum:</span> | |
| <span class="diff-add">+- Property name</span> | |
| <span class="diff-add">+- Nightly price and total price</span> | |
| <span class="diff-add">+- Rating and review count if visible</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+If exact numbers are missing, report as "unavailable" instead of guessing.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+Return at least the minimum option count requested by the user.</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Reliability tactics</span> | |
| <span class="diff-del">-- After opening search results, use `snapshot -i` before selecting elements.</span> | |
| <span class="diff-del">-- Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span> | |
| <span class="diff-del">-- Use `wait --load networkidle` or `wait --text` for synchronization.</span> | |
| <span class="diff-del">-- Always capture final URL via `steel browser get url`.</span> | |
| <span class="diff-del">-- Use `fill` with the positional value argument, e.g. `steel browser fill e16 "Rome" --session booking-search`.</span> | |
| <span class="diff-del">-- Do not use deprecated `--value`; use value as the second positional argument.</span> | |
| <span class="diff-del">-- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span> | |
| <span class="diff-del">-- Always call `steel browser stop` in success and failure paths.</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-add">+- After opening search results, use `steel browser snapshot -i --session <name>` before selecting elements.</span> | |
| <span class="diff-add">+- Favor element refs from snapshots (`@e1`, `@e2`) over brittle CSS selectors.</span> | |
| <span class="diff-add">+- Use `steel browser wait --load networkidle --session <name>` for synchronization.</span> | |
| <span class="diff-add">+- Use `steel browser fill <element> "<value>" --session <name>`—value is positional, NOT a flag.</span> | |
| <span class="diff-add">+- Use `steel browser screenshot --session <name>` (not `steel screenshot`).</span> | |
| <span class="diff-neutral"> </span> | |
| <span class="diff-neutral"> ## Output discipline</span> | |
| <span class="diff-del">-- Commands must include session lifecycle (`start` and `stop`).</span> | |
| <span class="diff-add">+</span> | |
| <span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span> | |
| <span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span> | |
| <span class="diff-add">+- Include the final Booking.com URL for traceability.</span></pre> | |
| </details> | |
| <details class="journey-step"> | |
| <summary>Final champion to promoted overlay</summary> | |
| <div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/promoted.overlay.md</div> | |
| <div class="sub">artifact diff</div> | |
| <div class="sub">lines: 41 -> 41 | +0 / -0</div> | |
| <pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre> | |
| </details> | |
| </section> | |
| <div class="foot">Report schema v1 | Generated: 2026-03-10T22:05:03.027188+00:00</div> | |
| </div> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment