arubis/task-authoring-advisory.md

## task-authoring-advisory.md

      
    Raw
  

              task-authoring-advisory.md
            
          
    Where to Write New Nebula Tasks

Advisory for task authors. Helps you find ideas that will pass overlap review on the first try.

What to Target, What to Avoid


Closed
Mostly Saturated
Partially Explored
Wide Open


CI/CD Pipeline Flow
Prometheus + Grafana
KEDA Autoscaling
GlitchTip


ArgoCD Sync + Drift
Loki + Fluent Bit
Istio Service Mesh
Maddy Mail Server


PostgreSQL
Gitea + Actions
MongoDB
Statping-ng


Harbor Registry
Jaeger Tracing
CronJobs / DaemonSets / StatefulSets


Keycloak / Auth Services

ConfigMap + Secret Propagation


ResourceQuotas + LimitRanges

MinIO Lifecycle


Node Pressure + Eviction

Grafana OnCall


Closed = your idea will almost certainly fail overlap review, regardless of how you frame it. 10+ existing tasks cover every major angle. Don't submit new tasks here.
Mostly saturated = heavy existing coverage (5-14 tasks), but narrow openings remain. You'll need a clearly distinct angle — see Remaining Angles for what's left, and the Framing Guide for strategies to survive overlap review in these areas.
Partially explored = room exists, but check existing ideas first to make sure your specific angle is distinct.
Wide open = strong opportunities with zero or near-zero existing coverage. Start here.

Self-Screening Checklist

Before proposing an idea, run through these questions:


Is it a PostgreSQL task?

Yes → Will not pass. 16-20 existing tasks cover every angle (WAL, HA, split-brain, DR, migrations, pooling, credential rotation, operator management). See Appendix A.
No → Continue.


Does the primary challenge involve CI/CD pipeline flow? (Gitea Actions → Harbor push/pull → ArgoCD sync → image promotion)

Yes → Will not pass. 13 tasks cover every pipeline stage. See Appendix A.
No → Continue.


Does it involve ArgoCD sync, drift, or reconciliation?

Yes → Will not pass. 12 tasks cover sync loops, persistent drift, wave deadlocks, AppProject RBAC, and image updater.
No → Continue.


Does it involve resource quotas, limits, or eviction? (ResourceQuota, LimitRange, node pressure, PriorityClasses)

Yes → Mostly saturated. See Appendix B and check Remaining Angles.
No → Continue.


Does it involve Keycloak, SSO, OIDC, or authentication services?

Yes → Mostly saturated. 6+ tasks cover IAM deployment, SSO integration across dev tools, auth chain drift, key rotation, and API gateway auth. Check Remaining Angles for what's left.
No → Continue.


Does it target Prometheus/Grafana, Loki/Fluent Bit, or another "Mostly Saturated" component?

Yes → Narrow openings exist but you need a clearly distinct angle. Check Remaining Angles before proposing.
No → Continue.


Does it target a "Wide Open" component from the table above?

Yes → Strong opportunity. Propose it.
No → Check the forum for existing ideas in that area before proposing.


Tip
If you landed on "will not pass" but still want to use those components, read The Escape Pattern — it's possible to write viable tasks that touch closed components as long as the primary challenge operates at a different layer.


Remaining Angles in Mostly Saturated Areas

These components have heavy existing coverage but specific narrow openings remain. If you want to write a task here, it must target one of these gaps — generic ideas in these areas will fail overlap review.


Component
Tasks
What's Left


Prometheus + Grafana
~14
SLO/SLI burn-rate methodology, Alertmanager routing trees + inhibition rules, remote write / federation, Grafana-as-Code provisioning


Loki + Fluent Bit
~8
LogQL-based alerting rules (Loki ruler), FluentBit parser/filter chain debugging (not throughput/backpressure)


Gitea + Actions
~10
Repository governance (branch protection, merge policies), workflow YAML authoring/debugging, runner resource management. Not the pipeline flow.


Harbor Registry
~8
Robot account management, per-project storage quotas, retention policies, replication configuration. Not push/pull/GC/auth in a pipeline context.


ResourceQuotas + LimitRanges
~5
LimitRange default injection failures (implicit limits causing non-obvious OOMKills). Very narrow.


Node Pressure + Eviction
~6
Disk pressure eviction specifically (ephemeral storage, imagefs vs nodefs). Memory/CPU/PID paths are covered.


Keycloak / Auth Services
~6
OIDC federation failures across multiple realms, Keycloak upgrade/migration scenarios, auth audit/compliance reporting. Core SSO integration (single-realm OIDC clients, role mapping, key rotation, token validation) is thoroughly covered.


Warning
Even for these remaining angles, check the forum first. Ideas here are harder to get right, and reviewers will scrutinize overlap carefully. See the Framing Guide for how to frame ideas that survive review in saturated areas.


Open Opportunity Areas

Ranked by how much clean surface area exists. Each includes concrete task concepts ready to propose.
Tier 1: Wide Open

1. KEDA (Event-Driven Autoscaling)

KEDA is deployed in the cluster but has limited task coverage targeting it directly. The debugging surface is distinct from HPA: misconfigured triggers that silently don't fire, TriggerAuthentication failures against event sources (RabbitMQ, Prometheus), conflicts when both HPA and KEDA target the same deployment, and scaling-to-zero edge cases.

Best fit: Cloud Ops or Platform Engineering

Starter concepts:

KEDA Trigger Authentication Failure Blocks Event-Driven Autoscaling
HPA/KEDA Scaling Conflict Causes Pod Count Oscillation

2. GlitchTip (Error Tracking)

GlitchTip is Nebula's Sentry-compatible error tracking platform. It's a genuinely distinct observability dimension from metrics (Prometheus), logs (Loki), and traces (Jaeger). Tasks could target DSN misconfiguration causing silent event loss, ingestion pipeline failures, or alert routing that masks critical exceptions. Some adjacent coverage exists, so frame ideas around GlitchTip-specific failure modes rather than generic observability.

Best fit: SRE or Cloud Ops

Starter concepts:

GlitchTip Error Ingestion Pipeline Failure — Services Silently Dropping Exceptions
GlitchTip Alert Routing Misconfiguration Masks Production Errors

3. Statping-ng (Status Page)

Zero approved ideas. Statping-ng is a standalone status page with its own health checks and user-facing availability dashboard. Distinct from Blackbox Exporter synthetic monitoring (which feeds into Prometheus/Grafana).

Best fit: SRE

Starter concepts:

Status Page Reports All-Green While Services Are Down
Statping-ng Flapping Monitors Flood Notification Channels

4. Maddy (Mail Server)

Zero approved ideas. Maddy handles SMTP relay for the platform (Grafana notifications, OnCall alerts, etc.). Runs as a StatefulSet with three mailboxes (devops@, operator@, opsmanager@nebula.local). SMTP misconfiguration, TLS negotiation failures, and the downstream impact of alert emails never arriving are all clean territory.

Best fit: SRE or DevOps

Starter concepts:

Maddy SMTP Relay Failure Silently Drops Alert Notification Emails

5. K8s Workload Primitives (CronJobs, DaemonSets, StatefulSets)

Zero approved ideas targeting these workload types specifically. CronJob failure chains (missed schedules, concurrency policy deadlocks), DaemonSet rolling updates creating gaps, and StatefulSet ordered scaling with PVC lifecycle issues are all untouched.

Best fit: Cloud Ops

Starter concepts:

CronJob Concurrency Policy Deadlock Causes Backup Job Backlog
StatefulSet Scale-Down Orphans Persistent Volumes
DaemonSet Rolling Update Creates Logging Gap

6. ConfigMap/Secret Propagation Mechanics

No approved ideas target the K8s propagation problem. Distinct from credential rotation tasks (which are about the values) — this is about the delivery mechanism: ConfigMap updated but pods serve stale config, immutable ConfigMap blocks emergency fixes, Secret rotation leaves pods split across old/new values.

Best fit: Cloud Ops or Platform Engineering

Starter concepts:

ConfigMap Update Propagation Failure — Pods Serve Stale Configuration
Immutable ConfigMap Trap Blocks Emergency Configuration Fix

Tier 2: Partially Explored

7. Grafana OnCall (Incident Response)

One vague approved idea exists (just a title, no description). Room for clearly scoped tasks around escalation chain failures, schedule rotation bugs, or integration breakdowns between OnCall and Mattermost/Maddy.

Best fit: SRE

Starter concepts:

Grafana OnCall Escalation Chain Broken — Incidents Route to Nobody
On-Call Schedule Rotation Failure During Handoff Window

8. Istio Service Mesh

Some coverage exists, but tasks focused on traffic management (as opposed to resource pressure from sidecars) have room. mTLS policy failures, VirtualService routing misconfigurations, and sidecar injection issues in specific namespaces are potential angles.

Best fit: Platform Engineering or SRE

9. MinIO (Object Storage)

Distinct from Harbor registry operations. Lifecycle policies, bucket versioning, cross-service storage access patterns.

Best fit: Cloud Ops

Not Listed Above?

The platform also includes RabbitMQ, Redis, CoreDNS, Mattermost, and Chaos Mesh — among others. If your idea targets a component not in any column, check the forum for existing coverage. Components absent from this table simply haven't been categorized yet, not necessarily that they're open or closed.

The Escape Pattern

If your idea touches saturated components but the primary challenge operates at a different Kubernetes layer, it can still work.
The stack has distinct operational layers. Existing tasks saturate the middle layers; the edges are less covered:
┌─────────────────────────────────────────────────────┐
│  API Admission          (webhooks, CRD validation)  │  ← less covered
├─────────────────────────────────────────────────────┤
│  Scheduling + Resources (quotas, eviction, priority)│  ← SATURATED
├─────────────────────────────────────────────────────┤
│  Workload Orchestration (GitOps, deploys, rollouts) │  ← SATURATED
├─────────────────────────────────────────────────────┤
│  Application Runtime    (pods, services, networking)│  ← partially covered
├─────────────────────────────────────────────────────┤
│  Data + Storage         (databases, queues, object) │  ← less covered
└─────────────────────────────────────────────────────┘

Examples of the escape pattern working:


Admission Webhook Cascade Failure — touches KEDA, Istio, and ArgoCD (all "saturated" components) but the actual challenge is API admission control, CRD versioning, and webhook lifecycle. Same components, different layer. Approved.


The Operator Takeover — originally framed as "deploy CloudNativePG" (overlaps with PostgreSQL HA). Reframed to "live-migrate production databases under traffic" — a distinct operation category (migration execution vs. greenfield build). Approved after reframing.


Note
The key question: Is the primary challenge about the same operation as an existing task (build, troubleshoot, configure), or about a fundamentally different operation (migrate, audit, enforce, orchestrate) that happens to involve the same components?
For more reframing strategies with real before/after examples, see the Framing Guide.


Quick Reference: Category Spec IDs


Category
Spec ID


DevOps
b407a435-9dc1-4cc3-950c-3194a8f08fde


SRE
46394e31-2a74-47c1-8359-51e1b678146d


Platform Engineering
9e4d158e-96ff-4435-ab39-4d1e389f4b47


Cloud Ops
450f2e9c-ba04-429c-bf80-e22be0065313


Appendices: Supporting Evidence

Everything above is actionable guidance. Everything below is the proof.

Appendix A: Why CI/CD Is Closed

13 approved tasks and ideas cover the full pipeline from code push to deployment:
  Gitea Actions ──→ Docker Build ──→ Harbor Push ──→ ArgoCD Sync ──→ K8s Deploy
       │                                  │               │              │
       ▼                                  ▼               ▼              ▼
  Cascading CI/CD              Harbor GC Deadlock    Sync Wave      Deployment
  Breaking CI/CD               GitOps Image Update   Deadlock       Rollout
  Webhook Amplif.              Broken Promotion      GitOps Drift   Failures
  The Broken Delivery                                Sync Loop
                                                     Canary Rollouts

Every stage of the pipeline has at least two tasks covering its failure modes. The full inventory:


#
Task/Idea
Component Focus
Status


1
Bleater GitOps Pipeline Repair
Gitea Actions, ArgoCD Image Updater, Harbor
Implemented


2
Harbor Registry GC Deadlock
Harbor storage, GC jobs
Implemented


3
ArgoCD Sync Wave Deadlock
ArgoCD sync waves, PreSync hooks
Implemented


4
Cascading CI/CD Pipeline Failures
Gitea Runner, Harbor creds, ArgoCD, disk space
Implemented


5
Deployment Rollout Failures
Deployments, security contexts, quotas
Implemented


6
Breaking CI/CD Pipeline
Gitea Actions tagging, Harbor permissions, ArgoCD updater
Approved


7
GitOps Image Update + Harbor Auth
Image Updater, Harbor tokens, Helm values
Approved


8
Broken GitOps Image Promotion
Harbor webhooks, Image Updater auth
Yellow


9
ArgoCD GitOps Sync Loop
Mutating webhook, KEDA conflict, Helm values
Approved


10
GitOps Drift That Survives Every Sync
Admission controllers, image automation
Approved


11
Gitea Webhook Amplification
Gitea webhooks, ArgoCD, Harbor jobservice
Yellow


12
The Broken Delivery
regcred secret, Helm registry override, CI error masking
Pending


13
GitOps Canary Rollouts Migration
ArgoCD ApplicationSets, Argo Rollouts, Istio
Pending


Appendix B: Why Resource Exhaustion Is Closed

8 approved tasks and ideas cover Kubernetes resource management:


#
Task/Idea
Component Focus
Status


1
Single-Node Chaos Hardening
Node memory pressure, eviction, scheduling
Implemented


2
Chaos Engineering Resilience
Chaos Mesh, pod-kill, network latency, CPU stress
Implemented


3
Resource Quota Deadlocks
ResourceQuotas, LimitRanges, PVC quotas
Approved


4
Deployment Rollout Failures
Resource quotas, security contexts
Implemented


5
Zombie Process PID Exhaustion
PID limits, init system, process reaping
Yellow


6
Node Operations — Eviction Mirage
Node drain, PDB, readiness timing
Yellow/rejected


7
Admission Webhook Cascade
Webhooks, CRD versioning, KEDA finalizers
Approved


8
Autoscaler Quota Spiral
KEDA, HPA, ResourceQuotas
Implemented


Appendix C: Why Connecting CI/CD + Resources Doesn't Work

Warning
A coherent causal chain connecting CI/CD to resource exhaustion (e.g., "CI storm causes node pressure which evicts critical services") still fails overlap review because each link in the chain is individually claimed by an existing task. Reviewers evaluate overlap at the component × failure-mode level, not at the narrative level.

Six specific constructions were tested:


#
Proposed Chain
Why It Fails


1
CI storm → node pressure → critical service eviction
CI storm = Cascading CI/CD (#4). Node pressure + eviction = Single-Node Chaos (#1).


2
Harbor GC → storage exhaustion → CI blockage
Harbor GC = Harbor GC Deadlock (#2). CI blockage from registry = GitOps Pipeline Repair (#1).


3
Webhook amplification → ArgoCD CPU spike → reconciliation failure
Webhooks = Gitea Webhook Amplification (#11). ArgoCD failure = ArgoCD Sync Loop (#9).


4
ResourceQuota too tight → deploys fail → CI hangs
Quotas = Resource Quota Deadlocks (#3 resource) + Deployment Rollout Failures (#5 CI/CD).


5
KEDA autoscaling → quota ceiling → cascade
KEDA + quota = Autoscaler Quota Spiral (#8 resource).


6
Image pull failures → pod churn → memory pressure → eviction
Image pulls = GitOps Pipeline Repair (#1 CI/CD). Eviction = Single-Node Chaos (#1 resource).


Components that appear unclaimed (Docker daemon, containerd, etcd) require root access that agents don't have. Components that are unclaimed but narrow (Trivy scanning, Harbor replication, inode exhaustion) can't sustain a 4-hour horizon.

Appendix D: Recent Examples

These illustrate the overlap problem in practice:


"The Repository Knot" — A well-constructed four-layer Gitea failure scenario (default branch switch, connection pool exhaustion, mirror sync overwrites, webhook deadlock). The nebula-reviewer bot flagged 86-88% overlap across three different framings. Rejected.


"Harbor CI/CD Pipeline Resource Cascade Failure" — A seven-issue cascade across CI/CD and resource exhaustion. Despite multiple attempts to narrow scope and create a "coherent causal chain," every construction overlapped with 2-5 existing tasks. The author was redirected to explore alternative domains.


"The Operator Takeover" — Originally "deploy CloudNativePG + PgBouncer + WAL archiving." Overlapped with PostgreSQL HA + PgBouncer (Patroni). Successfully reframed to focus on live migration execution — a distinct operation category. Approved after reframing.


Methodology

This advisory is based on a comprehensive analysis of all approved tasks, implemented tasks, and pending ideas across both #task-idea-feedback and #task-feedback channels, cross-referenced against the full Nebula infrastructure inventory.
Detailed supporting analysis:

Closed surface area analysis (CI/CD + Resource Exhaustion) — full evidence tables and causal chain testing
Open opportunity areas (ranked) — component-level gap analysis


Companion Documents


Framing Guide — How to frame ideas that survive overlap review, with real before/after case studies
Overlap Review Calibration Guide — For reviewers: how to evaluate ideas consistently, interpret bot output, and give constructive feedback


Last updated: 2026-02-18. If you're reading this more than a few weeks after this date, check with reviewers — new tasks may have filled some of these gaps.

  
## task-idea-framing-guide.md

      
    Raw
  

              task-idea-framing-guide.md
            
          
    Framing Task Ideas to Survive Overlap Review

Companion to the Task Authoring Advisory. The advisory tells you where to write tasks. This guide helps you frame ideas — especially when working near saturated areas.
See also: Overlap Review Calibration Guide — so you understand how reviewers think.

What Reviewers Actually Evaluate

Reviewers assess overlap on a hierarchy of criteria, not just surface-level component matching. Understanding this hierarchy is the difference between a first-try approval and three rounds of iteration.


Priority
What Reviewers Check
Weight


1
Investigation topology / skills tested
██████████ Highest


2
Components touched + verification end-state
███████░░░


3
Operation category (troubleshoot vs greenfield vs migration)
█████░░░░░


4
Root cause / failure mechanism
██░░░░░░░░ Lowest


Investigation topology is the diagnostic path the agent walks: what it checks, in what order, and what tools it uses to verify. Two tasks can have completely different root causes but still overlap if the agent walks the same pipeline, checks the same components, and verifies the same end state.

"While the root cause domain differs, the investigation topology is nearly identical — the agent walks the same pipeline, checks the same components, and verifies the same end state."

The auto-complete test: If completing Task A would auto-complete part of Task B, they overlap too much — regardless of how different the narratives sound.
What This Means in Practice


"Different root cause, same pipeline" fails. A CI/CD task broken by a DNS issue and a CI/CD task broken by a credential issue both have the agent walking the same pipeline stages. Different root cause, same topology = overlap.
"Same components, different K8s layer" can pass. A task involving ArgoCD at the admission control layer is fundamentally different from one involving ArgoCD at the sync/reconciliation layer — even though the component name appears in both.
Renaming alone never helps. The reviewer bot compares descriptions, not titles. Changing "Harbor CI/CD Cascade" to "Registry Resource Cascade" changes nothing if the described investigation is the same.


Task Types Matter

Not all task types receive equal scrutiny. Understanding this helps you frame ideas that land well.
Greenfield ◄──────────────────────────────────► Troubleshooting
  ⚠️ Extra scrutiny       Hybrid 🟢              ✅ Preferred
                        Migration 🟢

Troubleshooting (Preferred)

Something is broken. The agent must diagnose the problem, identify the root cause, and fix it. This is the most accepted category because it tests diagnostic reasoning — the core skill being evaluated.
Hybrid: Troubleshooting + Implementation (Sweet Spot)

Something was attempted but is broken or incomplete. The agent must both diagnose what went wrong AND complete the implementation correctly. This is the sweet spot because it tests diagnostic skills while also requiring domain knowledge to finish the work.
Migration (Accepted)

Move from state A to state B under production constraints (live traffic, data integrity, zero downtime). Migrations are a distinct operation category — the Operator Takeover is the canonical example.
Greenfield (Extra Scrutiny)

Build something from scratch. Greenfield tasks face the most scrutiny because reviewers question whether pure build tasks test the right skills. If you're proposing a greenfield task, expect pushback.

The preferred reframe for greenfield: "It was attempted but is broken." Instead of "deploy X from scratch," describe a scenario where someone started deploying X, got partway through, and left behind a broken state the agent must diagnose and complete.

Example — the Coordinated Backup reframe:


Before
After


"Build a coordinated backup pipeline across PostgreSQL, MongoDB, and MinIO"
"Discover that existing CronJobs meant to coordinate backups across data services are broken — timing races, credential staleness, and missing lifecycle hooks cause silent backup failures"


The reframe changes the operation from greenfield (build) to troubleshooting hybrid (diagnose existing broken coordination + fix it). Same domain knowledge required, but the task now tests diagnostic skills.

The Reframing Playbook

Real case studies from Discord, each showing how a reframe changed the outcome.
Successful Reframes


Case
Original Framing
Reframed To
Why It Worked


Operator Takeover
"Deploy CloudNativePG + PgBouncer + WAL archiving"
"Live-migrate production databases from manual PostgreSQL to operator-managed under traffic"
Changed operation category from greenfield deploy to live migration — distinct skills tested


Admission Webhook Cascade
Originally pivoted from a Harbor cascade idea
"API admission control failures cascade through KEDA, Istio, and ArgoCD CRDs"
Changed K8s layer — same components but the challenge is at the admission control layer, not the workload layer


Shadow Shard Protocol
"5-layer adversarial attack compromising cluster networking"
"A forgotten diagnostics tool left behind by a previous admin is corrupting networking"
Added operational realism, changed the K8s subsystem from abstract adversarial to concrete networking diagnostics


Partial Canary Disaster
"10+ independent broken things across the deployment pipeline"
"6-issue causal cascade where each failure triggers the next"
Made issues chain causally instead of being independent — a causal cascade is a single investigation, not 10 separate tasks


Coordinated Backup
"Build a backup pipeline across data services"
"Discover broken coordination in existing CronJobs causing silent backup failures"
Greenfield → troubleshooting hybrid — same domain, completely different skills tested


Anti-Patterns: Reframes That Failed


Case
What Was Tried
Why It Failed


Harbor CI/CD Cascade
Three iterations narrowing scope from 7 issues to 4, renaming from "Harbor" to "Registry", trying different component combinations
Every iteration still had the agent walking the same CI/CD pipeline. Narrowing scope doesn't help when the investigation topology is unchanged.


Database Partition Recovery
Renaming "Split-Brain Recovery" to "Database Partition Recovery" and arguing different skills (distributed systems vs. security/GitOps)
Renaming without changing the underlying investigation. The bot continued flagging 83-84% overlap. Reviewer: "The idea reviewer is practically telling you there are other similar ideas."


Progressive Canary
Implementing Argo Rollouts canary as a distinct operation from existing rollout tasks
Bot scored 89% overlap. Reviewer: "Both implement Argo Rollouts canary. Bharat's is a superset with additional complexity." A strict subset of an existing approved task.


The Pattern

Successful reframes change what the agent does, not what it encounters. Each success above changed either:

The operation category (greenfield → migration, greenfield → troubleshooting)
The K8s layer (workload orchestration → admission control)
The investigation topology (independent issues → causal chain, adversarial → operational)

Failed reframes changed the narrative while leaving the investigation path unchanged.

Working in Mostly Saturated Areas

The advisory's Remaining Angles table lists narrow openings in saturated areas. If you're targeting one of these, here's how to claim it convincingly.
Lead with What's Distinct, Not What's Similar

When proposing an idea in a saturated area, don't start with the shared components — start with what makes your investigation path different.
Bad opener: "This task involves Prometheus and Grafana, but focuses on SLO burn rates."
→ Reviewer immediately thinks "another Prometheus task" and looks for overlap.
Good opener: "The agent must implement SLO burn-rate alerting methodology — multi-window calculations, error budget policies, and alert routing based on burn velocity. This requires Prometheus but the investigation is mathematical/methodological, not infrastructure troubleshooting."
→ Reviewer sees a distinct skill being tested before checking component overlap.
Component Mentions vs. Component Focus

Reviewers assess component focus, not component mentions. A task that mentions Harbor in passing (because images need to exist somewhere) is different from a task where Harbor operations are the primary challenge.
When your task touches a saturated component incidentally:

Make clear in the description that it's a dependency, not the challenge
Emphasize what the agent actually spends time investigating
If most of the diagnostic work involves the saturated component, the mention-vs-focus distinction won't save you

When to Reframe vs. When to Pivot

Bot flags overlap
       │
       ▼
What did it match against?
       │
  ┌────┴────┐
  │         │
Same      Same components,
skills?   different skills?
  │         │
  ▼         ▼
Pivot     Reframe


70%+ overlap at the topology level → Start fresh. The Harbor CI/CD Cascade went through three iterations without escape because the fundamental investigation path was claimed.
50-70% overlap, different skills → Reframe. The Operator Takeover started at high overlap but escaped by changing the operation category.
Component keyword overlap only → Adjust your description to emphasize the distinct investigation. The Copa Airgap Patching idea scored 82% on the bot but was fast-tracked because the tool AND approach were genuinely novel.


Interpreting the nebula-reviewer Bot

The bot is a useful first filter, but it's not the final word. Understanding what it does — and doesn't — evaluate helps you respond effectively.
What the Bot Does


Compares your idea description against all existing task and idea descriptions
Uses semantic similarity, not keyword matching (but keywords heavily influence the score)
Returns a percentage overlap score against the closest matching existing tasks

What the Bot Doesn't Do


Evaluate investigation topology (the most important criterion)
Distinguish between component mentions and component focus
Assess whether the skills tested are genuinely different
Consider operation category differences

Reading Bot Scores


Score Range
What It Means
What to Do


85%+
Strong keyword/description overlap
Check what it matched against. If the matched tasks test the same skills, pivot. If they're at a different layer, explain the distinction.


70-85%
Moderate overlap detected
Look at the specific tasks flagged. Is the overlap structural (same investigation) or superficial (shared components)?


Below 70%
Low bot concern
Don't relax — human reviewers catch structural similarity the bot misses. Still verify your investigation topology is distinct.


Real Bot Score Outcomes


Copa Airgap Patching: 82% → Approved. High bot score driven by container/registry keywords, but the actual investigation (Copa tooling, air-gap patching workflow) was genuinely novel. Bot couldn't distinguish.
Repository Knot: 86-88% → Rejected. Bot was right. Three different framings all had the agent walking the same Gitea investigation path.
Progressive Canary: 89% → Rejected. Bot correctly identified this as a strict subset of an existing task.

When the bot flags you: Don't panic, and don't just rename things. Check what the bot matched your idea against, then assess whether the overlap is at the component level (potentially salvageable) or the investigation topology level (pivot needed).

Summary


Topology over narrative. What the agent does matters more than the story around it.
Reframe the operation, not the description. Change the category of work, not just the words.
Lead with distinction. Tell reviewers what's unique before they find what's similar.
Trust the hierarchy. Skills tested > components touched > operation type > root cause.
Use the bot wisely. It catches keyword overlap. You need to assess skill overlap.


This guide is part of a three-document set:

Task Authoring Advisory — landscape overview and opportunity areas
Framing Guide (this document) — how to frame ideas that survive review
Overlap Review Calibration Guide — how reviewers evaluate ideas

Last updated: 2026-02-17

  
## task-review-calibration-guide.md

      
    Raw
  

              task-review-calibration-guide.md
            
          
    Overlap Review Calibration Guide

For human reviewers evaluating task ideas in Discord. The bot gives you candidates; this guide helps you evaluate them consistently.
See also: Task Authoring Advisory — the landscape overview authors use, and Framing Guide — what authors are told about framing.

The Assessment Framework

Overlap review evaluates whether two tasks test the same skills via the same diagnostic path — not whether they mention the same components or tell a similar story.
Investigation Topology Is the Primary Signal

The most important question: Does the agent walk the same diagnostic path?
Two tasks that look different on the surface can have identical investigation topology:
Task A: "CI storm breaks pipeline"     Task B: "Harbor GC breaks pipeline"
  1. Check Gitea Actions logs             1. Check Harbor storage
  2. Check Harbor push/pull    ◄─same──►  2. Check Harbor GC/blobs
  3. Check ArgoCD sync status  ◄─same──►  3. Check ArgoCD sync status
  4. Verify deployment works   ◄─same──►  4. Verify deployment works
                    │
          Same investigation topology
          = blocking overlap

The root causes are different (CI overload vs. garbage collection), but the agent exercises the same skills: reading CI logs, checking registry state, verifying GitOps sync, and confirming deployment. This is blocking overlap.
The Auto-Complete Test

If completing Task A would auto-complete part of Task B, they overlap too much. This applies even when:

The tasks have different root causes
The tasks target different components
The narrative framing is distinct

Causal Chains Don't Defeat Individual Overlap

A "coherent causal chain" connecting multiple individually-claimed components still fails. If link 1 is claimed by Task X and link 2 is claimed by Task Y, chaining them together as "link 1 causes link 2" doesn't create new territory. See Appendix C of the advisory for six specific constructions that were tested and failed.

When Structural Similarity Is Acceptable vs. Blocking


Scenario
Verdict
Reasoning


Same pattern, different technology (MongoDB HA vs PostgreSQL HA)
✅ Generally acceptable
Different tools, failure modes, and domain knowledge required


Different tool, same operation (CloudNativePG deploy vs Patroni deploy)
❌ Blocking
Same skills tested, same verification end-state


Same components, different K8s layer (admission control vs workload scheduling)
✅ Acceptable
Fundamentally different diagnostic reasoning required


Novel narrative connecting individually claimed components
❌ Blocking
Each link overlaps individually — the chain doesn't create new skills


Genuinely novel paradigm (Copa air-gap patching)
✅ Acceptable
Both the tool AND the approach are new — no existing task exercises these skills


Same investigation, different root cause
❌ Blocking
Root cause has the lowest weight in the evaluation hierarchy


Same components, different operation category (troubleshoot vs migrate)
✅ Generally acceptable
Different skills: diagnostic reasoning vs. migration execution under constraints


Interpreting Bot Output

The nebula-reviewer bot compares description text semantically. It's a useful filter but not a decision-maker.
When to Trust the Bot

High scores (80%+) where the matched tasks share investigation topology — the bot is confirming what you'd find on manual review. Examples:

Repository Knot (86-88%): Bot correctly flagged that all three framings walked the same Gitea investigation path.
Progressive Canary (89%): Bot correctly identified a strict subset of an existing task.

When to Override the Bot

High scores driven by component keyword matching rather than skill overlap. The bot can't distinguish between "this task mentions Harbor" and "this task's primary challenge involves Harbor operations."

Copa Airgap Patching (82%): High score from container/registry keywords, but the actual investigation (Copa tooling, vulnerability patching in air-gapped environments) was genuinely novel. Approved and fast-tracked.
GDPR Data Erasure (82%): High score from database keywords, but the operation category (compliance-driven data lifecycle) was novel. Approved.

When the Bot Misses Things

Low bot scores don't guarantee no overlap. The bot misses structural similarity — two tasks with different descriptions that have the agent walk the same diagnostic path. Always check:

Would the same debugging skills solve both tasks?
Does the agent check the same components in the same order?
Would completing one task teach you everything needed for the other?

Your role as reviewer: The bot identifies candidates. You evaluate whether the overlap is at the skill/topology level or just the component/keyword level.

Calibration Examples

Reference cases showing how bot scores, verdicts, and reasoning align (or don't).


Idea
Bot Score
Verdict
Key Factor


Copa Airgap Patching
82%
✅ Approved
Genuinely novel paradigm — tool AND approach are both new to the task set


GDPR Data Erasure
82%
✅ Approved
Novel operation category — compliance-driven data lifecycle vs. database troubleshooting


Admission Webhook Cascade
82%
✅ Approved
Different K8s layer — admission control vs. workload orchestration, despite touching same components


ProxySQL
84%
✅ Approved (borderline)
Different tool with genuinely different capabilities — connection routing vs. simple pooling


Repository Knot (Gitea)
86-88%
❌ Rejected
Same investigation topology across three different framings — agent walks the same Gitea path


Harbor CI/CD Cascade
~85%
❌ Rejected
Same pipeline, same end-state — three iterations couldn't escape the overlap


Database Partition Recovery
83-84%
❌ Rejected
Renaming without substance change — investigation was still PostgreSQL HA troubleshooting


Progressive Canary
89%
❌ Rejected
Strict subset of existing GitOps Canary Rollouts task — every skill tested was already covered


Pattern: Bot scores in the 82-89% range can go either way. The deciding factor is never the score itself — it's whether the investigation topology is genuinely distinct.

Giving Constructive Feedback

Five named patterns from effective reviewer behavior. Match your approach to the situation.
Choosing Your Approach

Overlap severity?
       │
  ┌────┼────────┐
  │    │        │
>70%  50-70%   <50%
  │    │        │
  ▼    ▼        ▼
"Start  Detailed  Incremental
fresh"  analysis  stacking
        + open
        areas

1. Detailed Analysis + Open Area Guidance

When to use: Author has a reasonable idea that overlaps in a fixable way.
Check all existing threads in the matched area, explain specifically what the topology overlap is, then proactively list viable alternative angles with concrete title suggestions. Don't just say "this overlaps" — show the author where the open space is.
Example pattern: "Your idea overlaps with [specific tasks] because the agent walks the same path: check logs → verify sync → confirm deployment. However, if you shifted the challenge to the admission control layer — something like 'Webhook validation failures cascade through CRD updates' — that's a genuinely different investigation."
2. Pragmatic "Start Fresh"

When to use: Overlap is 70%+ at the topology level. Iteration won't help.
Be direct and efficient. When the overlap is deep and structural, extended analysis wastes both your time and the author's. A clear "this area is fully claimed, here are open areas instead" is more helpful than a detailed breakdown of why each component overlaps.
Example pattern: "This is deep in claimed territory — the investigation path matches [existing tasks]. Rather than iterating, I'd suggest looking at [open areas from the advisory]. Those have clean surface area."
3. Teaching Self-Assessment

When to use: Author would benefit from learning to evaluate overlap themselves.
Instead of providing the answer, ask the author to search the forum for their key components and assess what they find. This builds the author's calibration for future ideas.
Example pattern: "Before I review this, search the ideas forum for [key component]. Look at what's already there and assess whether your investigation path is distinct from those. What do you think?"
4. Concrete Reframe Offer

When to use: The core idea has potential but the framing is wrong (typically greenfield that should be troubleshooting).
Don't just say "make it troubleshooting" — show what that looks like. Provide a complete alternative framing so the author sees the shape of a viable version.
Example pattern: "Instead of 'deploy X from scratch,' consider: 'A previous admin attempted to set up X but left it in a broken state — the agent finds misconfigured [specific things] and must diagnose the failures while completing the implementation.' Same domain knowledge, but now it tests diagnostic skills."
5. Incremental Stacking

When to use: The idea is close but needs more differentiation to clear the bar.
Ask the author to stack additional functionality or constraints that push the investigation into unclaimed territory. Works when the base idea is 50-70% distinct.
Example pattern: "The core is close but overlaps with [task] on the [specific] steps. Can you add [specific additional challenge] that forces a different investigation path? That would push this into distinct territory."

Task Type Guidance for Reviewers

When to Suggest Troubleshooting Mode

Suggest it when:

The idea is a pure greenfield build with no diagnostic component
The skills tested would be identical whether the thing exists or not
Adding "it was attempted but is broken" would genuinely change the investigation

Don't suggest it when:

The idea is a migration (migration is a valid non-troubleshooting category)
The greenfield aspect requires genuinely novel integration knowledge
The author has already incorporated diagnostic elements

How to Provide a Concrete Reframe

Bad: "Can you make this troubleshooting instead of greenfield?"
→ Author doesn't know what you mean. They'll rename it and resubmit.
Good: "Instead of 'set up KEDA autoscaling for Bleater services,' consider: 'KEDA ScaledObjects are deployed but triggers aren't firing — TriggerAuthentication is misconfigured for RabbitMQ, the HPA conflict causes oscillation, and scale-to-zero leaves services unresponsive.' Same KEDA knowledge required, but now the agent diagnoses failures instead of building from scratch."
Hybrid as a Path Forward

When an author is stuck between greenfield and troubleshooting, suggest hybrid: "Someone started this, got partway through, and left behind a partially-working-but-broken state." This preserves the implementation knowledge the author wants to test while adding the diagnostic component reviewers want to see.

Summary


Topology first. Same diagnostic path = overlap, regardless of narrative differences.
Bot scores are inputs, not decisions. 82% can mean "approve" or "reject" depending on whether the overlap is structural or superficial.
Match feedback to severity. Deep overlap → direct redirect. Moderate overlap → detailed analysis. Light overlap → incremental stacking.
Show, don't tell. Concrete reframes beat abstract advice. "Make it troubleshooting" is not actionable; a complete alternative framing is.
Consider all operation categories. Troubleshooting is preferred, but migration, hybrid, and well-scoped greenfield are all valid.


This guide is part of a three-document set:

Task Authoring Advisory — landscape overview and opportunity areas
Framing Guide — what authors are told about framing ideas
Overlap Review Calibration Guide (this document) — how to evaluate ideas consistently

Last updated: 2026-02-17
Closed	Mostly Saturated	Partially Explored	Wide Open
CI/CD Pipeline Flow	Prometheus + Grafana	KEDA Autoscaling	GlitchTip
ArgoCD Sync + Drift	Loki + Fluent Bit	Istio Service Mesh	Maddy Mail Server
PostgreSQL	Gitea + Actions	MongoDB	Statping-ng
	Harbor Registry	Jaeger Tracing	CronJobs / DaemonSets / StatefulSets
	Keycloak / Auth Services		ConfigMap + Secret Propagation
	ResourceQuotas + LimitRanges		MinIO Lifecycle
	Node Pressure + Eviction		Grafana OnCall
Component	Tasks	What's Left
Prometheus + Grafana	~14	SLO/SLI burn-rate methodology, Alertmanager routing trees + inhibition rules, remote write / federation, Grafana-as-Code provisioning
Loki + Fluent Bit	~8	LogQL-based alerting rules (Loki ruler), FluentBit parser/filter chain debugging (not throughput/backpressure)
Gitea + Actions	~10	Repository governance (branch protection, merge policies), workflow YAML authoring/debugging, runner resource management. Not the pipeline flow.
Harbor Registry	~8	Robot account management, per-project storage quotas, retention policies, replication configuration. Not push/pull/GC/auth in a pipeline context.
ResourceQuotas + LimitRanges	~5	LimitRange default injection failures (implicit limits causing non-obvious OOMKills). Very narrow.
Node Pressure + Eviction	~6	Disk pressure eviction specifically (ephemeral storage, imagefs vs nodefs). Memory/CPU/PID paths are covered.
Keycloak / Auth Services	~6	OIDC federation failures across multiple realms, Keycloak upgrade/migration scenarios, auth audit/compliance reporting. Core SSO integration (single-realm OIDC clients, role mapping, key rotation, token validation) is thoroughly covered.
Category	Spec ID
DevOps	`b407a435-9dc1-4cc3-950c-3194a8f08fde`
SRE	`46394e31-2a74-47c1-8359-51e1b678146d`
Platform Engineering	`9e4d158e-96ff-4435-ab39-4d1e389f4b47`
Cloud Ops	`450f2e9c-ba04-429c-bf80-e22be0065313`
#	Task/Idea	Component Focus	Status
1	Bleater GitOps Pipeline Repair	Gitea Actions, ArgoCD Image Updater, Harbor	Implemented
2	Harbor Registry GC Deadlock	Harbor storage, GC jobs	Implemented
3	ArgoCD Sync Wave Deadlock	ArgoCD sync waves, PreSync hooks	Implemented
4	Cascading CI/CD Pipeline Failures	Gitea Runner, Harbor creds, ArgoCD, disk space	Implemented
5	Deployment Rollout Failures	Deployments, security contexts, quotas	Implemented
6	Breaking CI/CD Pipeline	Gitea Actions tagging, Harbor permissions, ArgoCD updater	Approved
7	GitOps Image Update + Harbor Auth	Image Updater, Harbor tokens, Helm values	Approved
8	Broken GitOps Image Promotion	Harbor webhooks, Image Updater auth	Yellow
9	ArgoCD GitOps Sync Loop	Mutating webhook, KEDA conflict, Helm values	Approved
10	GitOps Drift That Survives Every Sync	Admission controllers, image automation	Approved
11	Gitea Webhook Amplification	Gitea webhooks, ArgoCD, Harbor jobservice	Yellow
12	The Broken Delivery	regcred secret, Helm registry override, CI error masking	Pending
13	GitOps Canary Rollouts Migration	ArgoCD ApplicationSets, Argo Rollouts, Istio	Pending
#	Task/Idea	Component Focus	Status
1	Single-Node Chaos Hardening	Node memory pressure, eviction, scheduling	Implemented
2	Chaos Engineering Resilience	Chaos Mesh, pod-kill, network latency, CPU stress	Implemented
3	Resource Quota Deadlocks	ResourceQuotas, LimitRanges, PVC quotas	Approved
4	Deployment Rollout Failures	Resource quotas, security contexts	Implemented
5	Zombie Process PID Exhaustion	PID limits, init system, process reaping	Yellow
6	Node Operations — Eviction Mirage	Node drain, PDB, readiness timing	Yellow/rejected
7	Admission Webhook Cascade	Webhooks, CRD versioning, KEDA finalizers	Approved
8	Autoscaler Quota Spiral	KEDA, HPA, ResourceQuotas	Implemented
#	Proposed Chain	Why It Fails
1	CI storm → node pressure → critical service eviction	CI storm = Cascading CI/CD (#4). Node pressure + eviction = Single-Node Chaos (#1).
2	Harbor GC → storage exhaustion → CI blockage	Harbor GC = Harbor GC Deadlock (#2). CI blockage from registry = GitOps Pipeline Repair (#1).
3	Webhook amplification → ArgoCD CPU spike → reconciliation failure	Webhooks = Gitea Webhook Amplification (#11). ArgoCD failure = ArgoCD Sync Loop (#9).
4	ResourceQuota too tight → deploys fail → CI hangs	Quotas = Resource Quota Deadlocks (#3 resource) + Deployment Rollout Failures (#5 CI/CD).
5	KEDA autoscaling → quota ceiling → cascade	KEDA + quota = Autoscaler Quota Spiral (#8 resource).
6	Image pull failures → pod churn → memory pressure → eviction	Image pulls = GitOps Pipeline Repair (#1 CI/CD). Eviction = Single-Node Chaos (#1 resource).
Priority	What Reviewers Check	Weight
1	Investigation topology / skills tested	██████████ Highest
2	Components touched + verification end-state	███████░░░
3	Operation category (troubleshoot vs greenfield vs migration)	█████░░░░░
4	Root cause / failure mechanism	██░░░░░░░░ Lowest
Case	Original Framing	Reframed To	Why It Worked
Operator Takeover	"Deploy CloudNativePG + PgBouncer + WAL archiving"	"Live-migrate production databases from manual PostgreSQL to operator-managed under traffic"	Changed operation category from greenfield deploy to live migration — distinct skills tested
Admission Webhook Cascade	Originally pivoted from a Harbor cascade idea	"API admission control failures cascade through KEDA, Istio, and ArgoCD CRDs"	Changed K8s layer — same components but the challenge is at the admission control layer, not the workload layer
Shadow Shard Protocol	"5-layer adversarial attack compromising cluster networking"	"A forgotten diagnostics tool left behind by a previous admin is corrupting networking"	Added operational realism, changed the K8s subsystem from abstract adversarial to concrete networking diagnostics
Partial Canary Disaster	"10+ independent broken things across the deployment pipeline"	"6-issue causal cascade where each failure triggers the next"	Made issues chain causally instead of being independent — a causal cascade is a single investigation, not 10 separate tasks
Coordinated Backup	"Build a backup pipeline across data services"	"Discover broken coordination in existing CronJobs causing silent backup failures"	Greenfield → troubleshooting hybrid — same domain, completely different skills tested
Case	What Was Tried	Why It Failed
Harbor CI/CD Cascade	Three iterations narrowing scope from 7 issues to 4, renaming from "Harbor" to "Registry", trying different component combinations	Every iteration still had the agent walking the same CI/CD pipeline. Narrowing scope doesn't help when the investigation topology is unchanged.
Database Partition Recovery	Renaming "Split-Brain Recovery" to "Database Partition Recovery" and arguing different skills (distributed systems vs. security/GitOps)	Renaming without changing the underlying investigation. The bot continued flagging 83-84% overlap. Reviewer: "The idea reviewer is practically telling you there are other similar ideas."
Progressive Canary	Implementing Argo Rollouts canary as a distinct operation from existing rollout tasks	Bot scored 89% overlap. Reviewer: "Both implement Argo Rollouts canary. Bharat's is a superset with additional complexity." A strict subset of an existing approved task.
Score Range	What It Means	What to Do
85%+	Strong keyword/description overlap	Check what it matched against. If the matched tasks test the same skills, pivot. If they're at a different layer, explain the distinction.
70-85%	Moderate overlap detected	Look at the specific tasks flagged. Is the overlap structural (same investigation) or superficial (shared components)?
Below 70%	Low bot concern	Don't relax — human reviewers catch structural similarity the bot misses. Still verify your investigation topology is distinct.