Skip to content

Instantly share code, notes, and snippets.

@willkelly
Created January 8, 2026 00:30
Show Gist options
  • Select an option

  • Save willkelly/4ba6eca84989ab8b938972bd3e1858ab to your computer and use it in GitHub Desktop.

Select an option

Save willkelly/4ba6eca84989ab8b938972bd3e1858ab to your computer and use it in GitHub Desktop.
Flaky Test Analysis - commontoolsinc/labs (2026-01-07)

Flaky Test Analysis Report

Generated: 2026-01-07
Analyzed: 7 workflow reruns from recent PRs
Repository: commontoolsinc/labs

Executive Summary

Analyzed workflow runs that failed initially but succeeded on rerun without code changes. Found 4 distinct flaky tests across 3 integration test suites, with the shell charm test being the most frequent offender (3 occurrences).

Flaky Tests by Frequency

1. πŸ”΄ Shell Charm Test (3 occurrences) - MOST FLAKY

Test shell charm tests > can view and interact with a charm
File packages/shell/integration/charm.test.ts:81
Job Package Integration Tests
Affected PRs Gideon/battleship rebased, Gideon/superstition rationalization, perf(storage): reduce hash calls

Error Pattern:

AssertionError
  at assert (packages/shell/integration/charm.test.ts:81:5)

Root Cause: Race condition in UI interaction. The test clicks a button twice and expects the counter to show "-2", but the UI hasn't updated in time. The current sleep(1000) between clicks and sleep(1500) before assertion isn't sufficient.

The problematic code:

// Click twice
for (let i = 0; i < 2; i++) {
  const handle = await page.waitForSelector("#counter-decrement", { strategy: "pierce" });
  const _ = await handle.innerText();
  handle.click();
  await sleep(1000);  // ⚠️ Fixed delay, not waiting for state change
}
await sleep(1500);  // ⚠️ Fixed delay before assertion
const text = await handle.innerText();
assert(text === "Counter is the -2th number");  // ❌ Fails intermittently

Suggested Fix: Use waitFor() pattern to poll for expected state instead of fixed delays.


2. 🟠 Nested Counter Test (1 occurrence)

Test nested counter integration test > should load the nested counter charm and verify initial state
File packages/patterns/integration/nested-counter.test.ts:66-70
Job Pattern Integration Tests
Affected PRs fix(runner): prevent shell hang

Error Pattern:

TypeError: Cannot destructure property 'object' of '(intermediate value)' as it is undefined.
  at ElementHandle.evaluate (packages/vendor-astral/src/element_handle.ts:448:13)

Root Cause: Stale element handle. The element is found by waitForSelector but becomes detached from DOM before evaluate() is called. This happens during page re-renders.

Note: The test already has try/catch handling for this, but the waitFor timeout (500ms) may be too short.


3. 🟠 Counter Search Term Filter Test (1 occurrence)

Test counter-search-term-filter > counter filters entries by search term updates
File packages/generated-patterns/integration/patterns/counter-search-term-filter.test.ts
Job Generated Patterns Integration Tests
Affected PRs fix: prevent shell disconnect

Error Pattern:

AssertionError: counterWithSearchTermFilter:5:filtered: Values are not equal.
  Actual:   value: 1
  Expected: value: 5

Root Cause: Race condition in test harness. Step 5 expects value: 5 from step 4's updateCounter event, but the state update from step 4 hasn't propagated yet. The pattern harness isn't waiting for reactivity to settle.


4. 🟑 Procurement Request Test (1 occurrence)

Test procurement-request
File packages/generated-patterns/integration/patterns/procurement-request.test.ts:186
Job Generated Patterns Integration Tests
Affected PRs Fix scheduler dependency handling

Error Pattern:

Leaks detected:
  - A timer was started in this test, but never completed.
    at Scheduler.scheduleWithDebounce (packages/runner/src/scheduler.ts:1756:19)

Root Cause: Test cleanup issue. The scheduler's debounce timer (setTimeout at scheduler.ts:1756) isn't being cleared when the test ends. This is a Deno sanitizer detection, not an actual test logic failure.


Summary by Test Suite

Test Suite Flaky Tests Occurrences Primary Issue
Package Integration Tests 1 3 UI timing race condition
Pattern Integration Tests 1 1 Stale element handle
Generated Patterns Integration Tests 2 2 State propagation races, timer leaks

Recommendations

  1. High Priority: Fix the shell charm test - it's causing 50% of the reruns

    • Replace sleep() with waitFor() that polls for expected counter value
  2. Medium Priority: Investigate pattern harness timing

    • Add settling time or explicit wait after events in runPatternScenario()
  3. Low Priority: Fix scheduler timer cleanup

    • Ensure scheduleWithDebounce timers are cleared on dispose

Raw Data

Failed Runs Analyzed:

Run ID Job PR Attempt β†’ Result
20794339856 Package Integration Tests Gideon/battleship rebased 1 fail β†’ 2 pass
20792621745 Package Integration Tests Gideon/superstition rationalization 1 fail β†’ 2 pass
20760559756 Package Integration Tests perf(storage): reduce hash calls 1 fail β†’ 2 pass
20732131557 Pattern Integration Tests fix(runner): prevent shell hang 1 fail β†’ 3 pass
20726759519 Generated Patterns Integration Tests fix: prevent shell disconnect 1 fail β†’ 2 pass
20663576078 Generated Patterns Integration Tests Fix scheduler dependency handling 1 fail β†’ 2 pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment