AOrobator/eagle-eye-screenshot-review.md

## eagle-eye-screenshot-review.md

      
    Raw
  

              eagle-eye-screenshot-review.md
            
          
  name
  description
  
  
  eagle-eye-screenshot-review
  Deep screenshot QA orchestrator for visual + a11y regressions. Use after recording Paparazzi screenshots; launches subagents per test class and requires a11y parity.
  
  
Eagle-Eye Screenshot Review

Run a structured post-screenshot review across all newly generated Paparazzi images and enforce accessibility screenshot parity.
When to Use


Immediately after recordPaparazziRelease
Before opening a PR with screenshot changes
When UI semantics/content descriptions changed
When visual regressions are reported in CI

Non-Negotiable Rules


Always run after screenshot recording.
Fail if a11y screenshots are missing for any new/updated visual screenshot test.
Do not sign off until visual + a11y checks are complete.
If missing a11y tests, implement them, record snapshots, and re-run review.
When a11y semantics issues are found, invoke a11y-review before final recommendations.

Inputs


Newly generated screenshots in src/test/snapshots/images/
Corresponding screenshot test classes in src/test/java/.../*ScreenshotTest.kt
A11y screenshot classes in src/test/java/.../*A11yScreenshotTest.kt
UI state sources (view states/sealed classes/feature flags/boolean branches) used by the composable under test

Orchestration (Required)

Act as an orchestrator and launch subagents to review screenshot groups.

Gather changed screenshot PNGs (focus on files generated this run).
Group by screenshot test class prefix in filename.
Launch up to 4 subagents in parallel (one group per subagent).
Each subagent must inspect:

Visual screenshots (LIGHT/DARK, device variants)
Matching a11y screenshots (if present)
Related test/composable source when needed to validate semantics intent
Reachable UI state definitions to validate screenshot state coverage


Merge subagent findings into one severity-sorted report.

Timeout/Retry Strategy (Required)

If a subagent times out (for example during image captioning), the orchestrator must not stop or skip coverage.

Detect timeout/hang/failure from subagent output.
Re-run that group in smaller chunks:

split by state first (e.g., withImageSrc, withCreatorKitRemoved)
if needed split again by theme/device variant
if still needed, process one screenshot per subagent task


Continue retries until every targeted screenshot has been reviewed or an explicit blocker is documented.
Merge partial results from all retries into one final report and clearly mark retries performed.
Do not mark PASS while any intended screenshot remains unreviewed.

Subagent output format:

severity: critical | major | minor
screenshot: filename
issue: short description
evidence: what is visible in screenshot/a11y panel
recommended_fix: actionable patch direction

Final orchestrator report format (required):

For every finding, include clickable markdown links to:

problematic screenshot file
related screenshot test file
related source/composable file (when applicable)


Use this format:

severity | [screenshot](absolute-path-to-png) | [test](absolute-path-to-test-file) | issue | evidence | recommended_fix


If multiple screenshots are affected, include all as separate markdown links in the same row.
If no findings, still include links to reviewed test files and representative screenshots in the PASS summary.
Include a state coverage table per screenshot group:

state | represented_in_visual (yes/no) | represented_in_a11y (yes/no) | evidence


Required Checks (Visual)


Contrast issues (text/icon/background contrast)
Touch target sizing (aim for >= 48dp interactive targets)
Clipped text, clipped icons, clipped containers
Unexpected truncation/overlap/wrapping artifacts
Insufficient spacing or inconsistent padding rhythm
Misalignment and layout jitter between states/themes
Incorrect disabled-state affordances (looks enabled when disabled, or vice versa)
RPL/design-system consistency (prefer existing RPL components/patterns)
Theme parity (LIGHT vs DARK visual correctness)
Empty/placeholder/image-loading artifacts

Required Checks (State Coverage)


Build a state matrix from component inputs and branching logic (sealed states, enums, booleans, feature flags, and key combinations).
Verify every reachable/important state has screenshot coverage in visual tests.
Verify every reachable/important visual state has corresponding a11y screenshot coverage.
Treat unrepresented states as coverage defects even if no visual regression is currently visible.
If a state is intentionally excluded, require explicit rationale in the report.

Required Checks (A11y)


Missing a11y screenshots for changed visual tests (hard failure)
Focus-target correctness: label/role must be on the node TalkBack actually focuses (container vs child chosen intentionally)
A11y panel entries match actually enabled actions
Action-availability parity: contentDescription, semantics actions, and clickable(enabled=...) must be gated by the same condition
Disabled actions are not exposed as actionable semantics
Content descriptions are meaningful for action icons
Decorative icons are hidden (contentDescription = null)
Role/action labels are coherent with interaction behavior
Interactive role correctness: interactive elements must use an interactive role (for example Role.Button) and must not be exposed with non-interactive roles such as role=Image when acting like a button
No contradictory semantics (e.g., shows "Edit" but action disabled)
No nested parent+child clickables that represent the same action
Semantics modifier intent is correct: prefer redditClearAndSetSemantics in app code; use mergeDescendants when child actions must remain focusable
If semantics clearing/grouping hides child actions, required actions are re-exposed via custom actions
Aggregate announcements include meaningful visual status indicators (badges/verification/state)
Icon-only action labels describe user action (e.g., "Remove selfie image"), not glyph names (e.g., "Close icon")
Account for useUnmergedSemanticsTree experiment behavior when validating semantics recommendations

Skill Handshake: a11y-review

Use a11y-review as the semantics source of truth when eagle-eye detects accessibility issues.

During subagent review, if findings involve semantics/labels/roles/action exposure, run a11y-review.
Validate fix direction against a11y-review focus-target and icon-label rules.
Include in the final report:

what a11y-review rule was applied
whether screenshot/a11y output now matches that rule


If a new recurring pattern is discovered, update both skills:

add detection rule in eagle-eye-screenshot-review
add implementation guidance in a11y-review


Critical pattern to catch

If a screenshot/a11y panel shows an action label (e.g., "Edit selfie image") while that action is disabled/unavailable in that state, report as major and require code fix.
If an interactive/focusable action is announced with an incorrect non-interactive role (for example Edit selfie image {role=Image}), report as major and require role semantics fix.
Missing A11y Coverage Flow (Required)

When visual screenshots exist but matching a11y screenshots do not:

Create *A11yScreenshotTest using PaparazziA11yTest + A11yConfig.
Mirror key states from the visual screenshot test.
For image-loading composables (rememberGlidePainter), use MockPainterProvider + SizedBrushPainter and call:

paparazzi.redditSnapshot(mockPainterProvider = sizedMockPainterProvider)


Record snapshots again.
Re-run screenshot tests.
Re-run eagle-eye review.

Missing UI State Coverage Flow (Required)

When eagle-eye detects uncovered UI states:

Add missing visual screenshot test case(s) for the uncovered state(s).
Add matching a11y screenshot test case(s) for those same state(s).
Record snapshots again.
Re-run screenshot tests.
Re-run eagle-eye review and confirm state coverage table is complete.

Completion Criteria

Only pass when all are true:

Visual screenshots reviewed with no unresolved critical/major issues
Every intended screenshot was actually reviewed (including timeout retries in smaller chunks)
All reachable/important UI states are represented in visual screenshot tests (or explicitly justified)
A11y screenshots exist for all changed visual screenshot tests
All represented visual states have corresponding a11y state coverage (or explicitly justified)
Any missing a11y tests were implemented and recorded
Any missing visual/a11y state coverage was implemented and recorded
Updated screenshot tests pass
Findings summary is documented with explicit pass/fail per screenshot group

Quick Reference

# Record
./gradlew :module:impl:recordPaparazziRelease

# Test
./gradlew :module:impl:testReleaseUnitTest
No results found