Skip to content

Instantly share code, notes, and snippets.

@arubis
Last active February 24, 2026 22:12
Show Gist options
  • Select an option

  • Save arubis/c52705d564be6903347bceaf0b1c55aa to your computer and use it in GitHub Desktop.

Select an option

Save arubis/c52705d564be6903347bceaf0b1c55aa to your computer and use it in GitHub Desktop.
Review feedback: Ephemeral Debug Containers (E1DF2) - 2nd review

Suggestion: Distribute the Gitea specification across multiple issues

Task: Ephemeral Debug Containers (a9b57469-d16d-4430-9d32-dcb2caea6be4)

The problem

Reverting task.yaml to v11 style will help (agreed), but the Gitea issue itself is also a factor. Right now it's a complete specification in a single document -- exact image destination, exact tool list, exact role name, exact ServiceAccount, exact permissions, exact LimitRange values, and exact guidance on how to handle legacy RBAC. Once the agent reads it, the task becomes a checklist with nothing left to discover or infer.

All 8 v13 eval runs follow an identical arc with zero strategic divergence: read task.yaml -> find Gitea issue -> ls /opt/apk-cache -> find Kaniko in Harbor -> build -> RBAC -> done.

Suggestion: split the single Gitea issue into multiple discoverable resources

Distribute requirements across 2-3 Gitea issues (and optionally a wiki page) that the agent must discover and synthesize. This is more realistic -- in real organizations, requirements come from multiple teams and the engineer has to piece them together.


Issue #1: "Move to ephemeral debug containers for Bleater"

Voice: Team lead / engineering manager. Problem statement, not a solution spec.

Contains:

  • The backstory: dev team can't debug production pods, images are stripped down, current exec workaround is a security problem
  • The high-level goal: adopt Kubernetes ephemeral debug containers so devs can troubleshoot without exec access
  • A mention that the debug image needs to go into Harbor (and the specific names -- library/debug-tools for the image, developer-debugger for the role -- since the grader requires these exact names)
  • A note that this is air-gapped, everything must use what's already in the cluster
  • Cross-references: "Security filed their requirements in #2" and "The dev team's tool wishlist is in #3"

Does NOT contain: exact tool list, exact LimitRange values, APK cache location, or step-by-step RBAC instructions.


Issue #2: "Security review: excessive kubectl exec access in bleater namespace"

Voice: Security engineer. Audit finding.

Contains:

  • Flags that developer-test ServiceAccount currently has exec access through legacy RBAC
  • States the security requirements: no exec, no pod deletion, no deployment modifications
  • Mentions that existing bindings can't just be deleted because other teams depend on them -- the permissions need to be tightened instead
  • May reference issue #1 as the remediation path

Does NOT contain: the role name, exact RBAC manifests, or anything about the debug image.


Issue #3: "Debug tooling request from dev team"

Voice: Developer. Feature request.

Contains:

  • The list of tools: curl, wget, netcat, dig, jq, strace, tcpdump, vim, procps, bash
  • Context about why they need these (network debugging, process inspection, etc.)
  • A note that they tried installing tools at runtime but it doesn't work because the environment is air-gapped

Does NOT contain: where to put the image, how to build it, RBAC details, or resource limits.


Wiki page or issue comment: resource limits policy

Contains:

  • The platform team's standard resource defaults for debug/ephemeral workloads: CPU request 100m / limit 200m, memory request 128Mi / limit 256Mi
  • Could live on the Bleater wiki, as a comment on issue #1 from a platform engineer, or as a separate "Platform standards" issue

The idea is that each source gives the agent a piece of the puzzle, but no single source gives the complete checklist. The agent has to find them all and synthesize.

grader.py, solution.sh, and Dockerfile shouldn't need changes. After updating, run 8 biggie-nebula evals to confirm <70%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment