Skip to content

Instantly share code, notes, and snippets.

@AOrobator
Last active February 18, 2026 04:02
Show Gist options
  • Select an option

  • Save AOrobator/0875c51fc25c7e473eb482d16bceb216 to your computer and use it in GitHub Desktop.

Select an option

Save AOrobator/0875c51fc25c7e473eb482d16bceb216 to your computer and use it in GitHub Desktop.
Deep screenshot QA orchestrator for visual + a11y regressions. Use after recording Paparazzi screenshots; launches subagents per test class and requires a11y parity.
name description
eagle-eye-screenshot-review
Deep screenshot QA orchestrator for visual + a11y regressions. Use after recording Paparazzi screenshots; launches subagents per test class and requires a11y parity.

Eagle-Eye Screenshot Review

Run a structured post-screenshot review across all newly generated Paparazzi images and enforce accessibility screenshot parity.

When to Use

  • Immediately after recordPaparazziRelease
  • Before opening a PR with screenshot changes
  • When UI semantics/content descriptions changed
  • When visual regressions are reported in CI

Non-Negotiable Rules

  • Always run after screenshot recording.
  • Fail if a11y screenshots are missing for any new/updated visual screenshot test.
  • Do not sign off until visual + a11y checks are complete.
  • If missing a11y tests, implement them, record snapshots, and re-run review.
  • When a11y semantics issues are found, invoke a11y-review before final recommendations.

Inputs

  • Newly generated screenshots in src/test/snapshots/images/
  • Corresponding screenshot test classes in src/test/java/.../*ScreenshotTest.kt
  • A11y screenshot classes in src/test/java/.../*A11yScreenshotTest.kt
  • UI state sources (view states/sealed classes/feature flags/boolean branches) used by the composable under test

Orchestration (Required)

Act as an orchestrator and launch subagents to review screenshot groups.

  1. Gather changed screenshot PNGs (focus on files generated this run).
  2. Group by screenshot test class prefix in filename.
  3. Launch up to 4 subagents in parallel (one group per subagent).
  4. Each subagent must inspect:
    • Visual screenshots (LIGHT/DARK, device variants)
    • Matching a11y screenshots (if present)
    • Related test/composable source when needed to validate semantics intent
    • Reachable UI state definitions to validate screenshot state coverage
  5. Merge subagent findings into one severity-sorted report.

Timeout/Retry Strategy (Required)

If a subagent times out (for example during image captioning), the orchestrator must not stop or skip coverage.

  1. Detect timeout/hang/failure from subagent output.
  2. Re-run that group in smaller chunks:
    • split by state first (e.g., withImageSrc, withCreatorKitRemoved)
    • if needed split again by theme/device variant
    • if still needed, process one screenshot per subagent task
  3. Continue retries until every targeted screenshot has been reviewed or an explicit blocker is documented.
  4. Merge partial results from all retries into one final report and clearly mark retries performed.
  5. Do not mark PASS while any intended screenshot remains unreviewed.

Subagent output format:

  • severity: critical | major | minor
  • screenshot: filename
  • issue: short description
  • evidence: what is visible in screenshot/a11y panel
  • recommended_fix: actionable patch direction

Final orchestrator report format (required):

  • For every finding, include clickable markdown links to:
    • problematic screenshot file
    • related screenshot test file
    • related source/composable file (when applicable)
  • Use this format:
    • severity | [screenshot](absolute-path-to-png) | [test](absolute-path-to-test-file) | issue | evidence | recommended_fix
  • If multiple screenshots are affected, include all as separate markdown links in the same row.
  • If no findings, still include links to reviewed test files and representative screenshots in the PASS summary.
  • Include a state coverage table per screenshot group:
    • state | represented_in_visual (yes/no) | represented_in_a11y (yes/no) | evidence

Required Checks (Visual)

  • Contrast issues (text/icon/background contrast)
  • Touch target sizing (aim for >= 48dp interactive targets)
  • Clipped text, clipped icons, clipped containers
  • Unexpected truncation/overlap/wrapping artifacts
  • Insufficient spacing or inconsistent padding rhythm
  • Misalignment and layout jitter between states/themes
  • Incorrect disabled-state affordances (looks enabled when disabled, or vice versa)
  • RPL/design-system consistency (prefer existing RPL components/patterns)
  • Theme parity (LIGHT vs DARK visual correctness)
  • Empty/placeholder/image-loading artifacts

Required Checks (State Coverage)

  • Build a state matrix from component inputs and branching logic (sealed states, enums, booleans, feature flags, and key combinations).
  • Verify every reachable/important state has screenshot coverage in visual tests.
  • Verify every reachable/important visual state has corresponding a11y screenshot coverage.
  • Treat unrepresented states as coverage defects even if no visual regression is currently visible.
  • If a state is intentionally excluded, require explicit rationale in the report.

Required Checks (A11y)

  • Missing a11y screenshots for changed visual tests (hard failure)
  • Focus-target correctness: label/role must be on the node TalkBack actually focuses (container vs child chosen intentionally)
  • A11y panel entries match actually enabled actions
  • Action-availability parity: contentDescription, semantics actions, and clickable(enabled=...) must be gated by the same condition
  • Disabled actions are not exposed as actionable semantics
  • Content descriptions are meaningful for action icons
  • Decorative icons are hidden (contentDescription = null)
  • Role/action labels are coherent with interaction behavior
  • Interactive role correctness: interactive elements must use an interactive role (for example Role.Button) and must not be exposed with non-interactive roles such as role=Image when acting like a button
  • No contradictory semantics (e.g., shows "Edit" but action disabled)
  • No nested parent+child clickables that represent the same action
  • Semantics modifier intent is correct: prefer redditClearAndSetSemantics in app code; use mergeDescendants when child actions must remain focusable
  • If semantics clearing/grouping hides child actions, required actions are re-exposed via custom actions
  • Aggregate announcements include meaningful visual status indicators (badges/verification/state)
  • Icon-only action labels describe user action (e.g., "Remove selfie image"), not glyph names (e.g., "Close icon")
  • Account for useUnmergedSemanticsTree experiment behavior when validating semantics recommendations

Skill Handshake: a11y-review

Use a11y-review as the semantics source of truth when eagle-eye detects accessibility issues.

  1. During subagent review, if findings involve semantics/labels/roles/action exposure, run a11y-review.
  2. Validate fix direction against a11y-review focus-target and icon-label rules.
  3. Include in the final report:
    • what a11y-review rule was applied
    • whether screenshot/a11y output now matches that rule
  4. If a new recurring pattern is discovered, update both skills:
    • add detection rule in eagle-eye-screenshot-review
    • add implementation guidance in a11y-review

Critical pattern to catch

If a screenshot/a11y panel shows an action label (e.g., "Edit selfie image") while that action is disabled/unavailable in that state, report as major and require code fix.

If an interactive/focusable action is announced with an incorrect non-interactive role (for example Edit selfie image {role=Image}), report as major and require role semantics fix.

Missing A11y Coverage Flow (Required)

When visual screenshots exist but matching a11y screenshots do not:

  1. Create *A11yScreenshotTest using PaparazziA11yTest + A11yConfig.
  2. Mirror key states from the visual screenshot test.
  3. For image-loading composables (rememberGlidePainter), use MockPainterProvider + SizedBrushPainter and call:
    • paparazzi.redditSnapshot(mockPainterProvider = sizedMockPainterProvider)
  4. Record snapshots again.
  5. Re-run screenshot tests.
  6. Re-run eagle-eye review.

Missing UI State Coverage Flow (Required)

When eagle-eye detects uncovered UI states:

  1. Add missing visual screenshot test case(s) for the uncovered state(s).
  2. Add matching a11y screenshot test case(s) for those same state(s).
  3. Record snapshots again.
  4. Re-run screenshot tests.
  5. Re-run eagle-eye review and confirm state coverage table is complete.

Completion Criteria

Only pass when all are true:

  • Visual screenshots reviewed with no unresolved critical/major issues
  • Every intended screenshot was actually reviewed (including timeout retries in smaller chunks)
  • All reachable/important UI states are represented in visual screenshot tests (or explicitly justified)
  • A11y screenshots exist for all changed visual screenshot tests
  • All represented visual states have corresponding a11y state coverage (or explicitly justified)
  • Any missing a11y tests were implemented and recorded
  • Any missing visual/a11y state coverage was implemented and recorded
  • Updated screenshot tests pass
  • Findings summary is documented with explicit pass/fail per screenshot group

Quick Reference

# Record
./gradlew :module:impl:recordPaparazziRelease

# Test
./gradlew :module:impl:testReleaseUnitTest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment