Skip to content

Instantly share code, notes, and snippets.

@nibzard
Created March 10, 2026 22:05
Show Gist options
  • Select an option

  • Save nibzard/0d1590a7c828a5e6bf1b91f5f273a0ca to your computer and use it in GitHub Desktop.

Select an option

Save nibzard/0d1590a7c828a5e6bf1b91f5f273a0ca to your computer and use it in GitHub Desktop.
SkillPA optimization report: opt-booking-20260310-152921
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Optimization Report - booking</title>
<style>
:root {
--bg: #f6f7f3;
--surface: #ffffff;
--ink: #14212e;
--muted: #5b6673;
--accent: #126b79;
--line: #d6dbe2;
--ok: #2c7a42;
--bad: #9a2e2e;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: 'IBM Plex Sans', 'Segoe UI', sans-serif;
color: var(--ink);
background: linear-gradient(180deg, #f8f9f6 0%, #edf2ee 100%);
}
.wrap { max-width: 1180px; margin: 0 auto; padding: 24px; }
h1 { margin: 0 0 8px; font-size: 30px; }
.sub { color: var(--muted); margin-bottom: 18px; }
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 10px; margin-bottom: 18px; }
.card { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 12px; }
.k { color: var(--muted); font-size: 12px; text-transform: uppercase; letter-spacing: .03em; }
.v { font-size: 24px; font-weight: 650; margin-top: 4px; }
section { background: var(--surface); border: 1px solid var(--line); border-radius: 10px; padding: 14px; margin-bottom: 14px; overflow-x: auto; }
h3 { margin: 0 0 6px; font-size: 15px; }
h2 { margin: 0 0 10px; font-size: 18px; }
table { width: 100%; border-collapse: collapse; font-size: 13px; }
th, td { border-bottom: 1px solid var(--line); padding: 8px 6px; text-align: left; vertical-align: top; }
th { color: var(--muted); font-weight: 600; }
ul { margin: 0; padding-left: 18px; }
.lineage { display: grid; grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); gap: 10px; }
.lineage-node { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: linear-gradient(180deg, #ffffff 0%, #f6faf8 100%); }
.lineage-iteration { background: linear-gradient(180deg, #ffffff 0%, #f7fbff 100%); }
.lineage-promoted { background: linear-gradient(180deg, #ffffff 0%, #f6fcf6 100%); }
.lineage-title { font-weight: 700; margin-bottom: 2px; }
.lineage-sub, .lineage-path, .lineage-state, .lineage-detail { font-size: 12px; color: var(--muted); margin-top: 4px; }
.iteration-block { margin-top: 14px; }
.iteration-head { margin-bottom: 8px; }
.candidate-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 10px; }
.candidate-card { border: 1px solid var(--line); border-radius: 10px; padding: 12px; background: #fcfcfa; }
.candidate-head { display: flex; justify-content: space-between; gap: 10px; align-items: flex-start; margin-bottom: 8px; }
.candidate-title { font-weight: 700; }
.badge-row { display: flex; flex-wrap: wrap; gap: 6px; justify-content: flex-end; }
.badge { border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; font-size: 11px; color: var(--muted); background: #f5f7f9; }
.badge-mode { background: #eef7fb; color: #25556f; }
.badge-validated { background: #eef7ef; color: #2b6540; }
.badge-selected { background: #fff4dc; color: #8b5a00; }
.badge-promoted { background: #e8f7e9; color: #21643a; }
.badge-missing { background: #fdecec; color: #8e3434; }
.candidate-metrics { display: grid; grid-template-columns: repeat(3, minmax(0, 1fr)); gap: 6px; margin-bottom: 8px; }
.metric-k { display: block; font-size: 11px; color: var(--muted); text-transform: uppercase; }
.metric-v { display: block; font-size: 15px; font-weight: 650; margin-top: 2px; }
.candidate-detail { font-size: 12px; color: var(--muted); margin-top: 4px; }
.journey-step { border: 1px solid var(--line); border-radius: 10px; padding: 10px 12px; background: #fafcfb; margin-bottom: 10px; }
.journey-step summary { cursor: pointer; font-weight: 650; }
.grid-line { stroke: #e6ebf0; stroke-width: 1; }
.axis-line { stroke: #8b97a5; stroke-width: 1; }
.axis-label { fill: #647182; font-size: 11px; }
.score-line { stroke: var(--accent); stroke-width: 2.3; }
.line-point { fill: #126b79; }
.cost-bar { fill: #2d6b9f; opacity: .82; }
.diff-file { color: #9ec1ff; display: block; }
.diff-hunk { color: #ffd479; display: block; }
.diff-add { color: #65d28d; background: #123321; display: block; }
.diff-del { color: #ff9da1; background: #3b1818; display: block; }
.diff-neutral { color: #d9e4ef; display: block; }
.foot { color: var(--muted); font-size: 12px; margin-top: 16px; }
</style>
</head>
<body>
<div class="wrap">
<h1>Optimization Report</h1>
<div class="sub">Domain: <strong>booking</strong> | Run generated: 2026-03-10T21:44:06.229944+00:00</div>
<div class="cards">
<div class="card"><div class="k">Baseline score</div><div class="v">0.5850</div></div>
<div class="card"><div class="k">Final test score</div><div class="v">0.8700</div></div>
<div class="card"><div class="k">Delta</div><div class="v">0.2850</div></div>
<div class="card"><div class="k">Total cost</div><div class="v">$91.3182</div></div>
<div class="card"><div class="k">Wall clock</div><div class="v">6h 14m 45s</div></div>
<div class="card"><div class="k">Total invocations</div><div class="v">314</div></div>
<div class="card"><div class="k">Input tokens</div><div class="v">3,724,902</div></div>
<div class="card"><div class="k">Output tokens</div><div class="v">715,516</div></div>
<div class="card"><div class="k">Cache read tokens</div><div class="v">167,222,656</div></div>
<div class="card"><div class="k">Total tokens (incl cache)</div><div class="v">171,663,074</div></div>
<div class="card"><div class="k">Input from cache</div><div class="v">97.82%</div></div>
</div>
<section>
<h2>Score Over Iterations</h2>
<svg width="760" height="240" role="img" aria-label="Score trend chart"><line x1="46" y1="20.0" x2="742" y2="20.0" class="grid-line" /><line x1="46" y1="65.5" x2="742" y2="65.5" class="grid-line" /><line x1="46" y1="111.0" x2="742" y2="111.0" class="grid-line" /><line x1="46" y1="156.5" x2="742" y2="156.5" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="grid-line" /><line x1="46" y1="202.0" x2="742" y2="202.0" class="axis-line" /><line x1="46" y1="20" x2="46" y2="202.0" class="axis-line" /><polyline fill="none" points="46.0,202.0 162.0,46.2 278.0,90.5 394.0,77.9 510.0,103.0 626.0,20.0 742.0,37.6" class="score-line" /><circle cx="46.0" cy="202.0" r="3.5" class="line-point"><title>baseline-val: 0.5850</title></circle><circle cx="162.0" cy="46.2" r="3.5" class="line-point"><title>iter-1-selected-val: 0.8550</title></circle><circle cx="278.0" cy="90.5" r="3.5" class="line-point"><title>iter-2-selected-val: 0.7783</title></circle><circle cx="394.0" cy="77.9" r="3.5" class="line-point"><title>iter-3-selected-val: 0.8002</title></circle><circle cx="510.0" cy="103.0" r="3.5" class="line-point"><title>iter-4-selected-val: 0.7566</title></circle><circle cx="626.0" cy="20.0" r="3.5" class="line-point"><title>iter-5-selected-val: 0.9005</title></circle><circle cx="742.0" cy="37.6" r="3.5" class="line-point"><title>final-test: 0.8700</title></circle><text x="40" y="24.0" text-anchor="end" class="axis-label">0.90</text><text x="40" y="69.5" text-anchor="end" class="axis-label">0.82</text><text x="40" y="115.0" text-anchor="end" class="axis-label">0.74</text><text x="40" y="160.5" text-anchor="end" class="axis-label">0.66</text><text x="40" y="206.0" text-anchor="end" class="axis-label">0.58</text><text x="46.0" y="228" text-anchor="middle" class="axis-label">baseline-val</text><text x="162.0" y="228" text-anchor="middle" class="axis-label">iter-1-selected-val</text><text x="278.0" y="228" text-anchor="middle" class="axis-label">iter-2-selected-val</text><text x="394.0" y="228" text-anchor="middle" class="axis-label">iter-3-selected-val</text><text x="510.0" y="228" text-anchor="middle" class="axis-label">iter-4-selected-val</text><text x="626.0" y="228" text-anchor="middle" class="axis-label">iter-5-selected-val</text><text x="742.0" y="228" text-anchor="middle" class="axis-label">final-test</text></svg>
</section>
<section>
<h2>Cost Over Phases (USD)</h2>
<svg width="760" height="220" role="img" aria-label="Cost chart"><line x1="46" y1="174.0" x2="742" y2="174.0" class="axis-line" /><rect x="47.2" y="130.4" width="12.0" height="43.6" class="cost-bar"><title>baseline-val: 1.6705</title></rect><rect x="61.8" y="173.3" width="12.0" height="0.7" class="cost-bar"><title>baseline-core: 0.0262</title></rect><rect x="76.2" y="172.5" width="12.0" height="1.5" class="cost-bar"><title>iter-1-mutation: 0.0557</title></rect><rect x="90.8" y="97.8" width="12.0" height="76.2" class="cost-bar"><title>iter-1-train-candidate-00: 2.9211</title></rect><rect x="105.2" y="160.7" width="12.0" height="13.3" class="cost-bar"><title>iter-1-train-candidate-01: 0.5091</title></rect><rect x="119.8" y="76.8" width="12.0" height="97.2" class="cost-bar"><title>iter-1-train-candidate-02: 3.7284</title></rect><rect x="134.2" y="122.3" width="12.0" height="51.7" class="cost-bar"><title>iter-1-train-candidate-03: 1.9807</title></rect><rect x="148.8" y="76.6" width="12.0" height="97.4" class="cost-bar"><title>iter-1-val-candidate-00: 3.7353</title></rect><rect x="163.2" y="76.4" width="12.0" height="97.6" class="cost-bar"><title>iter-1-val-candidate-01: 3.7440</title></rect><rect x="177.8" y="173.3" width="12.0" height="0.7" class="cost-bar"><title>iter-1-core-candidate-00: 0.0276</title></rect><rect x="192.2" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-1-core-candidate-01: 0.0813</title></rect><rect x="206.8" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-2-mutation: 0.0811</title></rect><rect x="221.2" y="121.2" width="12.0" height="52.8" class="cost-bar"><title>iter-2-train-candidate-00: 2.0257</title></rect><rect x="235.8" y="103.3" width="12.0" height="70.7" class="cost-bar"><title>iter-2-train-candidate-01: 2.7098</title></rect><rect x="250.2" y="129.9" width="12.0" height="44.1" class="cost-bar"><title>iter-2-train-candidate-02: 1.6896</title></rect><rect x="264.8" y="113.7" width="12.0" height="60.3" class="cost-bar"><title>iter-2-train-candidate-03: 2.3128</title></rect><rect x="279.2" y="96.0" width="12.0" height="78.0" class="cost-bar"><title>iter-2-val-candidate-00: 2.9921</title></rect><rect x="293.8" y="107.6" width="12.0" height="66.4" class="cost-bar"><title>iter-2-val-candidate-01: 2.5477</title></rect><rect x="308.2" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-2-core-candidate-00: 0.0817</title></rect><rect x="322.8" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-2-core-candidate-01: 0.0798</title></rect><rect x="337.2" y="171.8" width="12.0" height="2.2" class="cost-bar"><title>iter-3-mutation: 0.0836</title></rect><rect x="351.8" y="103.2" width="12.0" height="70.8" class="cost-bar"><title>iter-3-train-candidate-00: 2.7148</title></rect><rect x="366.2" y="124.9" width="12.0" height="49.1" class="cost-bar"><title>iter-3-train-candidate-01: 1.8820</title></rect><rect x="380.8" y="110.3" width="12.0" height="63.7" class="cost-bar"><title>iter-3-train-candidate-02: 2.4433</title></rect><rect x="395.2" y="109.1" width="12.0" height="64.9" class="cost-bar"><title>iter-3-train-candidate-03: 2.4877</title></rect><rect x="409.8" y="72.1" width="12.0" height="101.9" class="cost-bar"><title>iter-3-val-candidate-00: 3.9088</title></rect><rect x="424.2" y="49.3" width="12.0" height="124.7" class="cost-bar"><title>iter-3-val-candidate-03: 4.7799</title></rect><rect x="438.8" y="172.9" width="12.0" height="1.1" class="cost-bar"><title>iter-3-core-candidate-00: 0.0414</title></rect><rect x="453.2" y="172.0" width="12.0" height="2.0" class="cost-bar"><title>iter-3-core-candidate-03: 0.0766</title></rect><rect x="467.8" y="171.9" width="12.0" height="2.1" class="cost-bar"><title>iter-4-mutation: 0.0806</title></rect><rect x="482.2" y="127.5" width="12.0" height="46.5" class="cost-bar"><title>iter-4-train-candidate-00: 1.7849</title></rect><rect x="496.8" y="106.0" width="12.0" height="68.0" class="cost-bar"><title>iter-4-train-candidate-01: 2.6091</title></rect><rect x="511.2" y="125.6" width="12.0" height="48.4" class="cost-bar"><title>iter-4-train-candidate-02: 1.8548</title></rect><rect x="525.8" y="115.3" width="12.0" height="58.7" class="cost-bar"><title>iter-4-train-candidate-03: 2.2514</title></rect><rect x="540.2" y="63.4" width="12.0" height="110.6" class="cost-bar"><title>iter-4-val-candidate-00: 4.2398</title></rect><rect x="554.8" y="65.4" width="12.0" height="108.6" class="cost-bar"><title>iter-4-val-candidate-03: 4.1638</title></rect><rect x="569.2" y="173.2" width="12.0" height="0.8" class="cost-bar"><title>iter-4-core-candidate-00: 0.0312</title></rect><rect x="583.8" y="173.3" width="12.0" height="0.7" class="cost-bar"><title>iter-4-core-candidate-03: 0.0275</title></rect><rect x="598.2" y="171.7" width="12.0" height="2.3" class="cost-bar"><title>iter-5-mutation: 0.0901</title></rect><rect x="612.8" y="118.2" width="12.0" height="55.8" class="cost-bar"><title>iter-5-train-candidate-00: 2.1393</title></rect><rect x="627.2" y="94.3" width="12.0" height="79.7" class="cost-bar"><title>iter-5-train-candidate-01: 3.0575</title></rect><rect x="641.8" y="91.7" width="12.0" height="82.3" class="cost-bar"><title>iter-5-train-candidate-02: 3.1568</title></rect><rect x="656.2" y="135.7" width="12.0" height="38.3" class="cost-bar"><title>iter-5-train-candidate-03: 1.4685</title></rect><rect x="670.8" y="16.0" width="12.0" height="158.0" class="cost-bar"><title>iter-5-val-candidate-00: 6.0584</title></rect><rect x="685.2" y="20.0" width="12.0" height="154.0" class="cost-bar"><title>iter-5-val-candidate-02: 5.9045</title></rect><rect x="699.8" y="173.4" width="12.0" height="0.6" class="cost-bar"><title>iter-5-core-candidate-00: 0.0232</title></rect><rect x="714.2" y="173.2" width="12.0" height="0.8" class="cost-bar"><title>iter-5-core-candidate-02: 0.0292</title></rect><rect x="728.8" y="149.8" width="12.0" height="24.2" class="cost-bar"><title>final-test: 0.9295</title></rect><text x="53.2" y="204" text-anchor="middle" class="axis-label">baseline-val</text><text x="67.8" y="204" text-anchor="middle" class="axis-label">baseline-core</text><text x="82.2" y="204" text-anchor="middle" class="axis-label">iter-1-mutation</text><text x="96.8" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-00</text><text x="111.2" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-01</text><text x="125.8" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-02</text><text x="140.2" y="204" text-anchor="middle" class="axis-label">iter-1-train-candidate-03</text><text x="154.8" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-00</text><text x="169.2" y="204" text-anchor="middle" class="axis-label">iter-1-val-candidate-01</text><text x="183.8" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-00</text><text x="198.2" y="204" text-anchor="middle" class="axis-label">iter-1-core-candidate-01</text><text x="212.8" y="204" text-anchor="middle" class="axis-label">iter-2-mutation</text><text x="227.2" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-00</text><text x="241.8" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-01</text><text x="256.2" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-02</text><text x="270.8" y="204" text-anchor="middle" class="axis-label">iter-2-train-candidate-03</text><text x="285.2" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-00</text><text x="299.8" y="204" text-anchor="middle" class="axis-label">iter-2-val-candidate-01</text><text x="314.2" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-00</text><text x="328.8" y="204" text-anchor="middle" class="axis-label">iter-2-core-candidate-01</text><text x="343.2" y="204" text-anchor="middle" class="axis-label">iter-3-mutation</text><text x="357.8" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-00</text><text x="372.2" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-01</text><text x="386.8" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-02</text><text x="401.2" y="204" text-anchor="middle" class="axis-label">iter-3-train-candidate-03</text><text x="415.8" y="204" text-anchor="middle" class="axis-label">iter-3-val-candidate-00</text><text x="430.2" y="204" text-anchor="middle" class="axis-label">iter-3-val-candidate-03</text><text x="444.8" y="204" text-anchor="middle" class="axis-label">iter-3-core-candidate-00</text><text x="459.2" y="204" text-anchor="middle" class="axis-label">iter-3-core-candidate-03</text><text x="473.8" y="204" text-anchor="middle" class="axis-label">iter-4-mutation</text><text x="488.2" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-00</text><text x="502.8" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-01</text><text x="517.2" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-02</text><text x="531.8" y="204" text-anchor="middle" class="axis-label">iter-4-train-candidate-03</text><text x="546.2" y="204" text-anchor="middle" class="axis-label">iter-4-val-candidate-00</text><text x="560.8" y="204" text-anchor="middle" class="axis-label">iter-4-val-candidate-03</text><text x="575.2" y="204" text-anchor="middle" class="axis-label">iter-4-core-candidate-00</text><text x="589.8" y="204" text-anchor="middle" class="axis-label">iter-4-core-candidate-03</text><text x="604.2" y="204" text-anchor="middle" class="axis-label">iter-5-mutation</text><text x="618.8" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-00</text><text x="633.2" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-01</text><text x="647.8" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-02</text><text x="662.2" y="204" text-anchor="middle" class="axis-label">iter-5-train-candidate-03</text><text x="676.8" y="204" text-anchor="middle" class="axis-label">iter-5-val-candidate-00</text><text x="691.2" y="204" text-anchor="middle" class="axis-label">iter-5-val-candidate-02</text><text x="705.8" y="204" text-anchor="middle" class="axis-label">iter-5-core-candidate-00</text><text x="720.2" y="204" text-anchor="middle" class="axis-label">iter-5-core-candidate-02</text><text x="734.8" y="204" text-anchor="middle" class="axis-label">final-test</text></svg>
</section>
<section>
<h2>Overlay Lineage</h2>
<div class="sub">How the champion evolved across iterations, including held-champion steps when gates rejected the selected candidate.</div>
<div class="lineage">
<div class="lineage-node lineage-baseline">
<div class="lineage-title">Initial overlay</div>
<div class="lineage-sub">starting champion</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 1</div>
<div class="lineage-sub">selected candidate-01</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-01</div><div class="lineage-detail">selected val: 0.8550</div><div class="lineage-detail">improvement: 0.2346</div><div class="lineage-detail">result: held previous champion</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 2</div>
<div class="lineage-sub">selected candidate-00</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">selected val: 0.7783</div><div class="lineage-detail">improvement: 0.0000</div><div class="lineage-detail">result: held previous champion</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 3</div>
<div class="lineage-sub">selected candidate-00</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">selected val: 0.8002</div><div class="lineage-detail">improvement: 0.0000</div><div class="lineage-detail">result: held previous champion</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 4</div>
<div class="lineage-sub">selected candidate-00</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-00</div><div class="lineage-detail">selected val: 0.7566</div><div class="lineage-detail">improvement: 0.0000</div><div class="lineage-detail">result: held previous champion</div>
</div>
<div class="lineage-node lineage-iteration">
<div class="lineage-title">Iteration 5</div>
<div class="lineage-sub">selected candidate-02</div>
<div class="lineage-path">inputs/initial.overlay.md</div>
<div class="lineage-state">present</div>
<div class="lineage-detail">selected: candidate-02</div><div class="lineage-detail">selected val: 0.9005</div><div class="lineage-detail">improvement: 0.0425</div><div class="lineage-detail">result: held previous champion</div>
</div>
<div class="lineage-node lineage-promoted">
<div class="lineage-title">Promoted overlay</div>
<div class="lineage-sub">final published artifact</div>
<div class="lineage-path">promoted.overlay.md</div>
<div class="lineage-state">present</div>
</div>
</div>
</section>
<section>
<h2>Iteration Candidates</h2>
<div class="sub">Per-iteration view of the explored candidate overlays, their scores, and whether they survived validation and gates.</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 1</h3>
<div class="sub">selected candidate: 1 | result: held previous champion</div>
<div class="sub">core_regression_gate_failed: candidate_core=0.8900, baseline_core=0.9550, tolerance=0.0200</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7598</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.6204</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.8662</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 2 | val 4 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.9542</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.8550</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.8900</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 3 | val 5 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.5209</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">fallback_copy</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.5197</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-1/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
</div>
</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 2</h3>
<div class="sub">selected candidate: 0 | result: held previous champion</div>
<div class="sub">val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2; core_regression_gate_failed: candidate_core=0.9100, baseline_core=0.9550, tolerance=0.0200</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6761</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.7783</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 2 | val 5 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7525</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.5783</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 2 | val 3 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6400</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6739</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-2/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
</div>
</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 3</h3>
<div class="sub">selected candidate: 0 | result: held previous champion</div>
<div class="sub">val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">1.0000</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.8002</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9550</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-3/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 3 | val 5 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.4933</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-3/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 1 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6367</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-3/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.9153</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.6200</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9100</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-3/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 3 | val 4 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
</div>
</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 4</h3>
<div class="sub">selected candidate: 0 | result: held previous champion</div>
<div class="sub">val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_hard_fail_count_nonzero; validation_repeat_gate_failed: win_count=0, threshold=2</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.5908</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.7566</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.9350</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-4/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 2 | val 5 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7450</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-4/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 3 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.5908</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-4/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.8592</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.7270</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.8900</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-4/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 3 | val 6 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
</div>
</div>
<div class="iteration-block">
<div class="iteration-head">
<h3>Iteration 5</h3>
<div class="sub">selected candidate: 2 | result: held previous champion</div>
<div class="sub">core_regression_gate_failed: candidate_core=0.8808, baseline_core=0.9550, tolerance=0.0200</div>
</div>
<div class="candidate-grid">
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-00</div>
<div class="badge-row"><span class="badge">train #1</span><span class="badge badge-mode">carry_forward</span><span class="badge badge-validated">validated</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.8867</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.8579</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">1.0000</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-5/candidate-00.md</div>
<div class="candidate-detail">exec counts: train 3 | val 6 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-01</div>
<div class="badge-row"><span class="badge">train #3</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.7542</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-5/candidate-01.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-02</div>
<div class="badge-row"><span class="badge">train #2</span><span class="badge badge-mode">claude</span><span class="badge badge-validated">validated</span><span class="badge badge-selected">selected</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.8737</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">0.9005</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">0.8808</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-5/candidate-02.md</div>
<div class="candidate-detail">exec counts: train 3 | val 6 | core 2</div>
<div class="candidate-detail">errors: train 0/0 | val 0/0 | core 0/0</div>
</div>
<div class="candidate-card">
<div class="candidate-head">
<div class="candidate-title">candidate-03</div>
<div class="badge-row"><span class="badge">train #4</span><span class="badge badge-mode">claude</span></div>
</div>
<div class="candidate-metrics">
<div><span class="metric-k">Train</span><span class="metric-v">0.6833</span></div>
<div><span class="metric-k">Val</span><span class="metric-v">n/a</span></div>
<div><span class="metric-k">Core</span><span class="metric-v">n/a</span></div>
</div>
<div class="candidate-detail">overlay: candidates/iter-5/candidate-03.md</div>
<div class="candidate-detail">exec counts: train 2 | val n/a | core n/a</div>
<div class="candidate-detail">errors: train 0/0 | val n/a/n/a | core n/a/n/a</div>
</div>
</div>
</div>
</section>
<section>
<h2>Promotion Decisions</h2>
<table>
<thead>
<tr>
<th>Iteration</th><th>Selected candidate</th><th>Selected val score</th><th>Improvement</th><th>Promoted</th><th>Reasons</th>
</tr>
</thead>
<tbody>
<tr><td>1</td><td>1</td><td>0.8550</td><td>0.2346</td><td>no</td><td>core_regression_gate_failed: candidate_core=0.8900, baseline_core=0.9550, tolerance=0.0200</td></tr><tr><td>2</td><td>0</td><td>0.7783</td><td>0.0000</td><td>no</td><td>val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2; core_regression_gate_failed: candidate_core=0.9100, baseline_core=0.9550, tolerance=0.0200</td></tr><tr><td>3</td><td>0</td><td>0.8002</td><td>0.0000</td><td>no</td><td>val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2</td></tr><tr><td>4</td><td>0</td><td>0.7566</td><td>0.0000</td><td>no</td><td>val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_hard_fail_count_nonzero; validation_repeat_gate_failed: win_count=0, threshold=2</td></tr><tr><td>5</td><td>2</td><td>0.9005</td><td>0.0425</td><td>no</td><td>core_regression_gate_failed: candidate_core=0.8808, baseline_core=0.9550, tolerance=0.0200</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Token and Cost Breakdown</h2>
<table>
<thead>
<tr>
<th>Role</th><th>Provider</th><th>Model</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cache create</th><th>Cost</th>
</tr>
</thead>
<tbody>
<tr><td>executor</td><td>claude</td><td>glm-4.5-air</td><td>5,517,828</td><td>146,840</td><td>1,626,342</td><td>0</td><td>$19.2440</td></tr><tr><td>executor</td><td>claude</td><td>glm-4.7</td><td>3,568,119</td><td>631,729</td><td>165,255,424</td><td>0</td><td>$69.7569</td></tr><tr><td>judge</td><td>claude</td><td>glm-4.7</td><td>146,566</td><td>64,830</td><td>1,713,408</td><td>0</td><td>$1.9262</td></tr><tr><td>judge</td><td>claude</td><td>sonnet</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td><td>n/a</td></tr><tr><td>mutator</td><td>claude</td><td>glm-4.7</td><td>10,217</td><td>18,957</td><td>253,824</td><td>0</td><td>$0.3912</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Provider Invocation Breakdown</h2>
<table>
<thead>
<tr>
<th>Provider</th><th>Total</th><th>Mutator</th><th>Executor</th><th>Judge</th><th>Input</th><th>Output</th><th>Cache read</th><th>Cost</th>
</tr>
</thead>
<tbody>
<tr><td>claude</td><td>314</td><td>14</td><td>150</td><td>150</td><td>3,724,902</td><td>715,516</td><td>167,222,656</td><td>$91.3182</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Run Diagnostics</h2>
<table>
<thead>
<tr>
<th>Metric</th><th>Value</th>
</tr>
</thead>
<tbody>
<tr><td>Run started at (UTC)</td><td>2026-03-10T15:29:21.555323+00:00</td></tr>
<tr><td>Run ended at (UTC)</td><td>2026-03-10T21:44:06.229944+00:00</td></tr>
<tr><td>Wall clock duration</td><td>6h 14m 45s</td></tr>
<tr><td>Active runtime (eval + mutator)</td><td>10h 0m 15s</td></tr>
<tr><td>Orchestration overhead</td><td>0s</td></tr>
<tr><td>Eval runtime total</td><td>9h 57m 3s</td></tr>
<tr><td>Mutator runtime total</td><td>3m 12s</td></tr>
<tr><td>Eval tasks with runtime</td><td>150</td></tr>
<tr><td>Mutator calls with duration</td><td>14</td></tr>
<tr><td>Candidate overlays generated</td><td>20</td></tr>
<tr><td>Candidate metadata records</td><td>20</td></tr>
<tr><td>Candidate overlays train-tested</td><td>20</td></tr>
<tr><td>Candidate overlays val/core-tested</td><td>10</td></tr>
<tr><td>Promotion events</td><td>0</td></tr>
</tbody>
</table>
</section>
<section>
<h2>Quality Flags</h2>
<ul><li>missing_cost_usage</li><li>missing_directional_token_usage</li></ul>
</section>
<section>
<h2>Overlay Diff (Initial vs Promoted)</h2>
<div class="sub">Initial: /home/agent/skillpa/reports/opt-booking-20260310-152921/inputs/initial.overlay.md<br/>Promoted: /home/agent/skillpa/reports/opt-booking-20260310-152921/promoted.overlay.md</div>
<div class="sub">Initial lines: 41 | Promoted lines: 41 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</section>
<section>
<h2>Stepwise Overlay Diffs</h2>
<div class="sub">Attempted changes per iteration, diffed against the current champion at that point in the run.</div>
<details class="journey-step">
<summary>Iteration 1: selected candidate-01</summary>
<div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-1/candidate-01.md</div>
<div class="sub">selected val 0.8550 | improvement 0.2346 | not promoted | core_regression_gate_failed: candidate_core=0.8900, baseline_core=0.9550, tolerance=0.0200</div>
<div class="sub">lines: 41 -> 69 | +38 / -10</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260310-152921/inputs/initial.overlay.md</span>
<span class="diff-file">+++ opt-booking-20260310-152921/candidates/iter-1/candidate-01.md</span>
<span class="diff-hunk">@@ -6,6 +6,14 @@</span>
<span class="diff-neutral"> - Return concrete hotel options that satisfy user constraints.</span>
<span class="diff-neutral"> - Preserve traceability: include constraints, chosen options, and final URL.</span>
<span class="diff-neutral"> - Keep command sequences executable under `steel browser` syntax.</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Execution contract</span>
<span class="diff-add">+All sessions MUST follow this lifecycle:</span>
<span class="diff-add">+1. `steel browser start --session &lt;name&gt;` - initialize session</span>
<span class="diff-add">+2. Execute search and filter commands using `--session &lt;name&gt;` consistently</span>
<span class="diff-add">+3. Extract results and capture final state</span>
<span class="diff-add">+4. `steel browser stop --session &lt;name&gt;` - ALWAYS call in both success and failure paths</span>
<span class="diff-add">+5. Never leave zombie sessions; use explicit cleanup on error</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Query shaping</span>
<span class="diff-neutral"> 1. Build searches using destination + check-in/check-out + adults/rooms first.</span>
<span class="diff-hunk">@@ -25,17 +33,37 @@</span>
<span class="diff-neutral"> - If exact numbers are missing, report as unavailable instead of hallucinating.</span>
<span class="diff-neutral"> - Return at least the minimum option count requested by the user.</span>
<span class="diff-neutral"> </span>
<span class="diff-del">-## Reliability tactics</span>
<span class="diff-del">-- After opening search results, use `snapshot -i` before selecting elements.</span>
<span class="diff-del">-- Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span>
<span class="diff-del">-- Use `wait --load networkidle` or `wait --text` for synchronization.</span>
<span class="diff-del">-- Always capture final URL via `steel browser get url`.</span>
<span class="diff-del">-- Use `fill` with the positional value argument, e.g. `steel browser fill e16 &quot;Rome&quot; --session booking-search`.</span>
<span class="diff-del">-- Do not use deprecated `--value`; use value as the second positional argument.</span>
<span class="diff-del">-- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span>
<span class="diff-del">-- Always call `steel browser stop` in success and failure paths.</span>
<span class="diff-add">+## Valid steel-browser patterns</span>
<span class="diff-add">+- Session lifecycle: `steel browser start --session s1` / `steel browser stop --session s1`</span>
<span class="diff-add">+- Navigation: `steel browser goto &quot;&lt;url&gt;&quot; --session s1`</span>
<span class="diff-add">+- Screenshots: `steel browser screenshot --session s1 --file out.png`</span>
<span class="diff-add">+- Fills: `steel browser fill @e16 &quot;Rome&quot; --session s1` (value as 2nd positional arg)</span>
<span class="diff-add">+- Clicks: `steel browser click @e5 --session s1`</span>
<span class="diff-add">+- Snapshots: `steel browser snapshot --session s1`</span>
<span class="diff-add">+- Waits: `steel browser wait --load networkidle --session s1` or `--text &quot;Search&quot;</span>
<span class="diff-add">+- URL capture: `steel browser get url --session s1`</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Failure policy</span>
<span class="diff-add">+- On any failure: immediately call `steel browser stop --session &lt;name&gt;`</span>
<span class="diff-add">+- Capture error state with screenshot before cleanup if possible</span>
<span class="diff-add">+- Report specific failure point (search load, filter apply, extraction)</span>
<span class="diff-add">+- If search returns no results: verify destination spelling, try broader search</span>
<span class="diff-add">+- If filters fail: apply incrementally, fallback to manual result extraction</span>
<span class="diff-add">+- If session hangs: use `--timeout` on wait commands, then force stop</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Known error map</span>
<span class="diff-add">+| Invalid Pattern | Correct Pattern |</span>
<span class="diff-add">+|----------------|----------------|</span>
<span class="diff-add">+| `steel screenshot` (no session) | `steel browser screenshot --session s1`</span>
<span class="diff-add">+| `fill --value &quot;text&quot;` (deprecated flag) | `fill @e5 &quot;text&quot;` (positional) |</span>
<span class="diff-add">+| `steel goto` (missing browser) | `steel browser goto &quot;url&quot; --session s1`</span>
<span class="diff-add">+| Missing `--session` on any command | Always include `--session &lt;name&gt;` |</span>
<span class="diff-add">+| Forgetting `stop` on error paths | Always cleanup in success AND failure |</span>
<span class="diff-add">+| Combining 5+ filters at once | Apply 2-3 filters, verify, continue |</span>
<span class="diff-add">+| CSS selectors without snapshot | Use `snapshot` then `@e1` refs |</span>
<span class="diff-add">+| Assuming page ready instantly | Use `wait --load networkidle` or `--text` |</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Output discipline</span>
<span class="diff-del">-- Commands must include session lifecycle (`start` and `stop`).</span>
<span class="diff-add">+- Commands must include session lifecycle (start and stop).</span>
<span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span>
<span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span></pre>
</details>
<details class="journey-step">
<summary>Iteration 2: selected candidate-00</summary>
<div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-2/candidate-00.md</div>
<div class="sub">selected val 0.7783 | improvement 0.0000 | not promoted | val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2; core_regression_gate_failed: candidate_core=0.9100, baseline_core=0.9550, tolerance=0.0200</div>
<div class="sub">lines: 41 -> 41 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
<details class="journey-step">
<summary>Iteration 3: selected candidate-00</summary>
<div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-3/candidate-00.md</div>
<div class="sub">selected val 0.8002 | improvement 0.0000 | not promoted | val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_repeat_gate_failed: win_count=0, threshold=2</div>
<div class="sub">lines: 41 -> 41 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
<details class="journey-step">
<summary>Iteration 4: selected candidate-00</summary>
<div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-4/candidate-00.md</div>
<div class="sub">selected val 0.7566 | improvement 0.0000 | not promoted | val_improvement_below_threshold: improvement=0.0000, threshold=0.0100; validation_hard_fail_count_nonzero; validation_repeat_gate_failed: win_count=0, threshold=2</div>
<div class="sub">lines: 41 -> 41 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
<details class="journey-step">
<summary>Iteration 5: selected candidate-02</summary>
<div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/candidates/iter-5/candidate-02.md</div>
<div class="sub">selected val 0.9005 | improvement 0.0425 | not promoted | core_regression_gate_failed: candidate_core=0.8808, baseline_core=0.9550, tolerance=0.0200</div>
<div class="sub">lines: 41 -> 85 | +64 / -20</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;"><span class="diff-file">--- opt-booking-20260310-152921/inputs/initial.overlay.md</span>
<span class="diff-file">+++ opt-booking-20260310-152921/candidates/iter-5/candidate-02.md</span>
<span class="diff-hunk">@@ -7,35 +7,79 @@</span>
<span class="diff-neutral"> - Preserve traceability: include constraints, chosen options, and final URL.</span>
<span class="diff-neutral"> - Keep command sequences executable under `steel browser` syntax.</span>
<span class="diff-neutral"> </span>
<span class="diff-add">+## Execution contract (REQUIRED)</span>
<span class="diff-add">+</span>
<span class="diff-add">+Every booking workflow MUST follow this lifecycle:</span>
<span class="diff-add">+</span>
<span class="diff-add">+```</span>
<span class="diff-add">+1. steel browser start --session &lt;session-name&gt;</span>
<span class="diff-add">+2. [navigation, fill, wait, snapshot, extract commands]</span>
<span class="diff-add">+3. steel browser get url --session &lt;session-name&gt;</span>
<span class="diff-add">+4. steel browser stop --session &lt;session-name&gt;</span>
<span class="diff-add">+```</span>
<span class="diff-add">+</span>
<span class="diff-add">+**Critical rules:**</span>
<span class="diff-add">+- ALWAYS start with `steel browser start --session &lt;name&gt;`</span>
<span class="diff-add">+- ALWAYS end with `steel browser stop --session &lt;name&gt;` in both success and failure paths</span>
<span class="diff-add">+- Include `--session &lt;name&gt;` in EVERY steel browser command</span>
<span class="diff-add">+- Capture final URL before stopping the session</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Failure policy</span>
<span class="diff-add">+</span>
<span class="diff-add">+If extraction or filtering fails:</span>
<span class="diff-add">+1. Attempt recovery with retry (max 2 retries) using `wait --load networkidle`</span>
<span class="diff-add">+2. If unrecoverable, still call `steel browser stop --session &lt;name&gt;` to clean up</span>
<span class="diff-add">+3. Report partial results with explicit gap description: &quot;Could not extract [field]; marked as unavailable&quot;</span>
<span class="diff-add">+4. Never omit `stop` command—session cleanup is mandatory even on error</span>
<span class="diff-add">+</span>
<span class="diff-add">+## Known error map / invalid-command prevention</span>
<span class="diff-add">+</span>
<span class="diff-add">+| Error pattern | Correct approach | Why |</span>
<span class="diff-add">+|--------------|------------------|-----|</span>
<span class="diff-add">+| `fill e1 --value &quot;text&quot;` | `fill e1 &quot;text&quot;` | `--value` flag is deprecated; use positional value |</span>
<span class="diff-add">+| `steel screenshot` | `steel browser screenshot --session X` | Must use browser subcommand with session |</span>
<span class="diff-add">+| `click @e1` | `click @e1 --session X` | All commands require `--session` flag |</span>
<span class="diff-add">+| `fill e16 &quot;Rome&quot;` | `steel browser fill e16 &quot;Rome&quot; --session X` | Must prefix with `steel browser` |</span>
<span class="diff-add">+| Missing `stop` command | Always append `steel browser stop --session X` | Prevents session leaks |</span>
<span class="diff-add">+| CSS selectors like `#button` | Use snapshot refs like `@e1` after `snapshot -i` | More reliable after dynamic rendering |</span>
<span class="diff-add">+</span>
<span class="diff-neutral"> ## Query shaping</span>
<span class="diff-add">+</span>
<span class="diff-neutral"> 1. Build searches using destination + check-in/check-out + adults/rooms first.</span>
<span class="diff-del">-2. Prefer URL query parameters for deterministic bootstrapping when possible.</span>
<span class="diff-add">+2. Prefer URL query parameters for deterministic bootstrapping when possible:</span>
<span class="diff-add">+ - `ss=destination&amp;checkin=YYYY-MM-DD&amp;checkout=YYYY-MM-DD&amp;group_adults=N`</span>
<span class="diff-neutral"> 3. Apply filters incrementally and snapshot after each major change.</span>
<span class="diff-neutral"> </span>
<span class="diff-del">-## Filter strategy</span>
<span class="diff-del">-- Prioritize these filters in order when relevant:</span>
<span class="diff-del">- 1. Price ceiling/range</span>
<span class="diff-del">- 2. Guest count and stay duration</span>
<span class="diff-del">- 3. Review score / star rating</span>
<span class="diff-del">- 4. Property type and distance preferences</span>
<span class="diff-del">-- Avoid combining too many filters in one step; verify list updates between steps.</span>
<span class="diff-add">+## Filter strategy (ordered priority)</span>
<span class="diff-add">+</span>
<span class="diff-add">+1. Price ceiling/range</span>
<span class="diff-add">+2. Guest count and stay duration </span>
<span class="diff-add">+3. Review score / star rating</span>
<span class="diff-add">+4. Property type and distance preferences</span>
<span class="diff-add">+</span>
<span class="diff-add">+Avoid combining too many filters in one step; verify list updates between steps.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Extraction strategy</span>
<span class="diff-del">-- Extract at least: property name, nightly/total price, rating/review count if visible.</span>
<span class="diff-del">-- If exact numbers are missing, report as unavailable instead of hallucinating.</span>
<span class="diff-del">-- Return at least the minimum option count requested by the user.</span>
<span class="diff-add">+</span>
<span class="diff-add">+Extract at minimum:</span>
<span class="diff-add">+- Property name</span>
<span class="diff-add">+- Nightly price and total price</span>
<span class="diff-add">+- Rating and review count if visible</span>
<span class="diff-add">+</span>
<span class="diff-add">+If exact numbers are missing, report as &quot;unavailable&quot; instead of guessing.</span>
<span class="diff-add">+</span>
<span class="diff-add">+Return at least the minimum option count requested by the user.</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Reliability tactics</span>
<span class="diff-del">-- After opening search results, use `snapshot -i` before selecting elements.</span>
<span class="diff-del">-- Favor element refs from snapshots (`@e1`) over brittle CSS when possible.</span>
<span class="diff-del">-- Use `wait --load networkidle` or `wait --text` for synchronization.</span>
<span class="diff-del">-- Always capture final URL via `steel browser get url`.</span>
<span class="diff-del">-- Use `fill` with the positional value argument, e.g. `steel browser fill e16 &quot;Rome&quot; --session booking-search`.</span>
<span class="diff-del">-- Do not use deprecated `--value`; use value as the second positional argument.</span>
<span class="diff-del">-- Use `steel browser screenshot` (not `steel screenshot`) and keep `--session` with each command.</span>
<span class="diff-del">-- Always call `steel browser stop` in success and failure paths.</span>
<span class="diff-add">+</span>
<span class="diff-add">+- After opening search results, use `steel browser snapshot -i --session &lt;name&gt;` before selecting elements.</span>
<span class="diff-add">+- Favor element refs from snapshots (`@e1`, `@e2`) over brittle CSS selectors.</span>
<span class="diff-add">+- Use `steel browser wait --load networkidle --session &lt;name&gt;` for synchronization.</span>
<span class="diff-add">+- Use `steel browser fill &lt;element&gt; &quot;&lt;value&gt;&quot; --session &lt;name&gt;`—value is positional, NOT a flag.</span>
<span class="diff-add">+- Use `steel browser screenshot --session &lt;name&gt;` (not `steel screenshot`).</span>
<span class="diff-neutral"> </span>
<span class="diff-neutral"> ## Output discipline</span>
<span class="diff-del">-- Commands must include session lifecycle (`start` and `stop`).</span>
<span class="diff-add">+</span>
<span class="diff-neutral"> - Final answer must explicitly confirm whether all user constraints were met.</span>
<span class="diff-neutral"> - If constraints cannot be fully met, provide best available alternatives and note the gap.</span>
<span class="diff-add">+- Include the final Booking.com URL for traceability.</span></pre>
</details>
<details class="journey-step">
<summary>Final champion to promoted overlay</summary>
<div class="sub">opt-booking-20260310-152921/inputs/initial.overlay.md -> opt-booking-20260310-152921/promoted.overlay.md</div>
<div class="sub">artifact diff</div>
<div class="sub">lines: 41 -> 41 | +0 / -0</div>
<pre style="white-space: pre-wrap; overflow-x: auto; background: #0f1720; color: #d9e4ef; border-radius: 8px; padding: 10px; font-size: 12px;">No textual changes detected.</pre>
</details>
</section>
<div class="foot">Report schema v1 | Generated: 2026-03-10T22:05:03.027188+00:00</div>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment