arubis/ephemeral-debug-review-feedback.md

## ephemeral-debug-review-feedback.md

      
    Raw
  

              ephemeral-debug-review-feedback.md
            
          
    Suggestion: Distribute the Gitea specification across multiple issues

Task: Ephemeral Debug Containers (a9b57469-d16d-4430-9d32-dcb2caea6be4)
The problem

Reverting task.yaml to v11 style will help (agreed), but the Gitea issue itself is also a factor. Right now it's a complete specification in a single document -- exact image destination, exact tool list, exact role name, exact ServiceAccount, exact permissions, exact LimitRange values, and exact guidance on how to handle legacy RBAC. Once the agent reads it, the task becomes a checklist with nothing left to discover or infer.
All 8 v13 eval runs follow an identical arc with zero strategic divergence: read task.yaml -> find Gitea issue -> ls /opt/apk-cache -> find Kaniko in Harbor -> build -> RBAC -> done.
Suggestion: split the single Gitea issue into multiple discoverable resources

Distribute requirements across 2-3 Gitea issues (and optionally a wiki page) that the agent must discover and synthesize. This is more realistic -- in real organizations, requirements come from multiple teams and the engineer has to piece them together.

Issue #1: "Move to ephemeral debug containers for Bleater"

Voice: Team lead / engineering manager. Problem statement, not a solution spec.
Contains:

The backstory: dev team can't debug production pods, images are stripped down, current exec workaround is a security problem
The high-level goal: adopt Kubernetes ephemeral debug containers so devs can troubleshoot without exec access
A mention that the debug image needs to go into Harbor (and the specific names -- library/debug-tools for the image, developer-debugger for the role -- since the grader requires these exact names)
A note that this is air-gapped, everything must use what's already in the cluster
Cross-references: "Security filed their requirements in #2" and "The dev team's tool wishlist is in #3"

Does NOT contain: exact tool list, exact LimitRange values, APK cache location, or step-by-step RBAC instructions.

Issue #2: "Security review: excessive kubectl exec access in bleater namespace"

Voice: Security engineer. Audit finding.
Contains:

Flags that developer-test ServiceAccount currently has exec access through legacy RBAC
States the security requirements: no exec, no pod deletion, no deployment modifications
Mentions that existing bindings can't just be deleted because other teams depend on them -- the permissions need to be tightened instead
May reference issue #1 as the remediation path

Does NOT contain: the role name, exact RBAC manifests, or anything about the debug image.

Issue #3: "Debug tooling request from dev team"

Voice: Developer. Feature request.
Contains:

The list of tools: curl, wget, netcat, dig, jq, strace, tcpdump, vim, procps, bash
Context about why they need these (network debugging, process inspection, etc.)
A note that they tried installing tools at runtime but it doesn't work because the environment is air-gapped

Does NOT contain: where to put the image, how to build it, RBAC details, or resource limits.

Wiki page or issue comment: resource limits policy

Contains:

The platform team's standard resource defaults for debug/ephemeral workloads: CPU request 100m / limit 200m, memory request 128Mi / limit 256Mi
Could live on the Bleater wiki, as a comment on issue #1 from a platform engineer, or as a separate "Platform standards" issue


The idea is that each source gives the agent a piece of the puzzle, but no single source gives the complete checklist. The agent has to find them all and synthesize.
grader.py, solution.sh, and Dockerfile shouldn't need changes. After updating, run 8 biggie-nebula evals to confirm <70%.
No results found