Skip to content

Instantly share code, notes, and snippets.

@arubis
Last active February 18, 2026 22:44
Show Gist options
  • Select an option

  • Save arubis/955086dc73b52aec314a2ea12e6a8d21 to your computer and use it in GitHub Desktop.

Select an option

Save arubis/955086dc73b52aec314a2ea12e6a8d21 to your computer and use it in GitHub Desktop.
Where to Write New Nebula Tasks — advisory for task authors on saturated vs. open opportunity areas

Where to Write New Nebula Tasks

Advisory for task authors. Helps you find ideas that will pass overlap review on the first try.


What to Target, What to Avoid

Closed Mostly Saturated Partially Explored Wide Open
CI/CD Pipeline Flow Prometheus + Grafana KEDA Autoscaling GlitchTip
ArgoCD Sync + Drift Loki + Fluent Bit Istio Service Mesh Maddy Mail Server
PostgreSQL Gitea + Actions MongoDB Statping-ng
Harbor Registry Jaeger Tracing CronJobs / DaemonSets / StatefulSets
Keycloak / Auth Services ConfigMap + Secret Propagation
ResourceQuotas + LimitRanges MinIO Lifecycle
Node Pressure + Eviction Grafana OnCall

Closed = your idea will almost certainly fail overlap review, regardless of how you frame it. 10+ existing tasks cover every major angle. Don't submit new tasks here.

Mostly saturated = heavy existing coverage (5-14 tasks), but narrow openings remain. You'll need a clearly distinct angle — see Remaining Angles for what's left, and the Framing Guide for strategies to survive overlap review in these areas.

Partially explored = room exists, but check existing ideas first to make sure your specific angle is distinct.

Wide open = strong opportunities with zero or near-zero existing coverage. Start here.


Self-Screening Checklist

Before proposing an idea, run through these questions:

  1. Is it a PostgreSQL task?

    • Yes → Will not pass. 16-20 existing tasks cover every angle (WAL, HA, split-brain, DR, migrations, pooling, credential rotation, operator management). See Appendix A.
    • No → Continue.
  2. Does the primary challenge involve CI/CD pipeline flow? (Gitea Actions → Harbor push/pull → ArgoCD sync → image promotion)

    • Yes → Will not pass. 13 tasks cover every pipeline stage. See Appendix A.
    • No → Continue.
  3. Does it involve ArgoCD sync, drift, or reconciliation?

    • Yes → Will not pass. 12 tasks cover sync loops, persistent drift, wave deadlocks, AppProject RBAC, and image updater.
    • No → Continue.
  4. Does it involve resource quotas, limits, or eviction? (ResourceQuota, LimitRange, node pressure, PriorityClasses)

  5. Does it involve Keycloak, SSO, OIDC, or authentication services?

    • Yes → Mostly saturated. 6+ tasks cover IAM deployment, SSO integration across dev tools, auth chain drift, key rotation, and API gateway auth. Check Remaining Angles for what's left.
    • No → Continue.
  6. Does it target Prometheus/Grafana, Loki/Fluent Bit, or another "Mostly Saturated" component?

    • Yes → Narrow openings exist but you need a clearly distinct angle. Check Remaining Angles before proposing.
    • No → Continue.
  7. Does it target a "Wide Open" component from the table above?

    • Yes → Strong opportunity. Propose it.
    • No → Check the forum for existing ideas in that area before proposing.

Tip

If you landed on "will not pass" but still want to use those components, read The Escape Pattern — it's possible to write viable tasks that touch closed components as long as the primary challenge operates at a different layer.


Remaining Angles in Mostly Saturated Areas

These components have heavy existing coverage but specific narrow openings remain. If you want to write a task here, it must target one of these gaps — generic ideas in these areas will fail overlap review.

Component Tasks What's Left
Prometheus + Grafana ~14 SLO/SLI burn-rate methodology, Alertmanager routing trees + inhibition rules, remote write / federation, Grafana-as-Code provisioning
Loki + Fluent Bit ~8 LogQL-based alerting rules (Loki ruler), FluentBit parser/filter chain debugging (not throughput/backpressure)
Gitea + Actions ~10 Repository governance (branch protection, merge policies), workflow YAML authoring/debugging, runner resource management. Not the pipeline flow.
Harbor Registry ~8 Robot account management, per-project storage quotas, retention policies, replication configuration. Not push/pull/GC/auth in a pipeline context.
ResourceQuotas + LimitRanges ~5 LimitRange default injection failures (implicit limits causing non-obvious OOMKills). Very narrow.
Node Pressure + Eviction ~6 Disk pressure eviction specifically (ephemeral storage, imagefs vs nodefs). Memory/CPU/PID paths are covered.
Keycloak / Auth Services ~6 OIDC federation failures across multiple realms, Keycloak upgrade/migration scenarios, auth audit/compliance reporting. Core SSO integration (single-realm OIDC clients, role mapping, key rotation, token validation) is thoroughly covered.

Warning

Even for these remaining angles, check the forum first. Ideas here are harder to get right, and reviewers will scrutinize overlap carefully. See the Framing Guide for how to frame ideas that survive review in saturated areas.


Open Opportunity Areas

Ranked by how much clean surface area exists. Each includes concrete task concepts ready to propose.

Tier 1: Wide Open

1. KEDA (Event-Driven Autoscaling)

KEDA is deployed in the cluster but has limited task coverage targeting it directly. The debugging surface is distinct from HPA: misconfigured triggers that silently don't fire, TriggerAuthentication failures against event sources (RabbitMQ, Prometheus), conflicts when both HPA and KEDA target the same deployment, and scaling-to-zero edge cases.

Best fit: Cloud Ops or Platform Engineering

Starter concepts:

  • KEDA Trigger Authentication Failure Blocks Event-Driven Autoscaling
  • HPA/KEDA Scaling Conflict Causes Pod Count Oscillation

2. GlitchTip (Error Tracking)

GlitchTip is Nebula's Sentry-compatible error tracking platform. It's a genuinely distinct observability dimension from metrics (Prometheus), logs (Loki), and traces (Jaeger). Tasks could target DSN misconfiguration causing silent event loss, ingestion pipeline failures, or alert routing that masks critical exceptions. Some adjacent coverage exists, so frame ideas around GlitchTip-specific failure modes rather than generic observability.

Best fit: SRE or Cloud Ops

Starter concepts:

  • GlitchTip Error Ingestion Pipeline Failure — Services Silently Dropping Exceptions
  • GlitchTip Alert Routing Misconfiguration Masks Production Errors

3. Statping-ng (Status Page)

Zero approved ideas. Statping-ng is a standalone status page with its own health checks and user-facing availability dashboard. Distinct from Blackbox Exporter synthetic monitoring (which feeds into Prometheus/Grafana).

Best fit: SRE

Starter concepts:

  • Status Page Reports All-Green While Services Are Down
  • Statping-ng Flapping Monitors Flood Notification Channels

4. Maddy (Mail Server)

Zero approved ideas. Maddy handles SMTP relay for the platform (Grafana notifications, OnCall alerts, etc.). Runs as a StatefulSet with three mailboxes (devops@, operator@, opsmanager@nebula.local). SMTP misconfiguration, TLS negotiation failures, and the downstream impact of alert emails never arriving are all clean territory.

Best fit: SRE or DevOps

Starter concepts:

  • Maddy SMTP Relay Failure Silently Drops Alert Notification Emails

5. K8s Workload Primitives (CronJobs, DaemonSets, StatefulSets)

Zero approved ideas targeting these workload types specifically. CronJob failure chains (missed schedules, concurrency policy deadlocks), DaemonSet rolling updates creating gaps, and StatefulSet ordered scaling with PVC lifecycle issues are all untouched.

Best fit: Cloud Ops

Starter concepts:

  • CronJob Concurrency Policy Deadlock Causes Backup Job Backlog
  • StatefulSet Scale-Down Orphans Persistent Volumes
  • DaemonSet Rolling Update Creates Logging Gap

6. ConfigMap/Secret Propagation Mechanics

No approved ideas target the K8s propagation problem. Distinct from credential rotation tasks (which are about the values) — this is about the delivery mechanism: ConfigMap updated but pods serve stale config, immutable ConfigMap blocks emergency fixes, Secret rotation leaves pods split across old/new values.

Best fit: Cloud Ops or Platform Engineering

Starter concepts:

  • ConfigMap Update Propagation Failure — Pods Serve Stale Configuration
  • Immutable ConfigMap Trap Blocks Emergency Configuration Fix

Tier 2: Partially Explored

7. Grafana OnCall (Incident Response)

One vague approved idea exists (just a title, no description). Room for clearly scoped tasks around escalation chain failures, schedule rotation bugs, or integration breakdowns between OnCall and Mattermost/Maddy.

Best fit: SRE

Starter concepts:

  • Grafana OnCall Escalation Chain Broken — Incidents Route to Nobody
  • On-Call Schedule Rotation Failure During Handoff Window

8. Istio Service Mesh

Some coverage exists, but tasks focused on traffic management (as opposed to resource pressure from sidecars) have room. mTLS policy failures, VirtualService routing misconfigurations, and sidecar injection issues in specific namespaces are potential angles.

Best fit: Platform Engineering or SRE

9. MinIO (Object Storage)

Distinct from Harbor registry operations. Lifecycle policies, bucket versioning, cross-service storage access patterns.

Best fit: Cloud Ops

Not Listed Above?

The platform also includes RabbitMQ, Redis, CoreDNS, Mattermost, and Chaos Mesh — among others. If your idea targets a component not in any column, check the forum for existing coverage. Components absent from this table simply haven't been categorized yet, not necessarily that they're open or closed.


The Escape Pattern

If your idea touches saturated components but the primary challenge operates at a different Kubernetes layer, it can still work.

The stack has distinct operational layers. Existing tasks saturate the middle layers; the edges are less covered:

┌─────────────────────────────────────────────────────┐
│  API Admission          (webhooks, CRD validation)  │  ← less covered
├─────────────────────────────────────────────────────┤
│  Scheduling + Resources (quotas, eviction, priority)│  ← SATURATED
├─────────────────────────────────────────────────────┤
│  Workload Orchestration (GitOps, deploys, rollouts) │  ← SATURATED
├─────────────────────────────────────────────────────┤
│  Application Runtime    (pods, services, networking)│  ← partially covered
├─────────────────────────────────────────────────────┤
│  Data + Storage         (databases, queues, object) │  ← less covered
└─────────────────────────────────────────────────────┘

Examples of the escape pattern working:

  • Admission Webhook Cascade Failure — touches KEDA, Istio, and ArgoCD (all "saturated" components) but the actual challenge is API admission control, CRD versioning, and webhook lifecycle. Same components, different layer. Approved.

  • The Operator Takeover — originally framed as "deploy CloudNativePG" (overlaps with PostgreSQL HA). Reframed to "live-migrate production databases under traffic" — a distinct operation category (migration execution vs. greenfield build). Approved after reframing.

Note

The key question: Is the primary challenge about the same operation as an existing task (build, troubleshoot, configure), or about a fundamentally different operation (migrate, audit, enforce, orchestrate) that happens to involve the same components?

For more reframing strategies with real before/after examples, see the Framing Guide.


Quick Reference: Category Spec IDs

Category Spec ID
DevOps b407a435-9dc1-4cc3-950c-3194a8f08fde
SRE 46394e31-2a74-47c1-8359-51e1b678146d
Platform Engineering 9e4d158e-96ff-4435-ab39-4d1e389f4b47
Cloud Ops 450f2e9c-ba04-429c-bf80-e22be0065313


Appendices: Supporting Evidence

Everything above is actionable guidance. Everything below is the proof.


Appendix A: Why CI/CD Is Closed

13 approved tasks and ideas cover the full pipeline from code push to deployment:

  Gitea Actions ──→ Docker Build ──→ Harbor Push ──→ ArgoCD Sync ──→ K8s Deploy
       │                                  │               │              │
       ▼                                  ▼               ▼              ▼
  Cascading CI/CD              Harbor GC Deadlock    Sync Wave      Deployment
  Breaking CI/CD               GitOps Image Update   Deadlock       Rollout
  Webhook Amplif.              Broken Promotion      GitOps Drift   Failures
  The Broken Delivery                                Sync Loop
                                                     Canary Rollouts

Every stage of the pipeline has at least two tasks covering its failure modes. The full inventory:

# Task/Idea Component Focus Status
1 Bleater GitOps Pipeline Repair Gitea Actions, ArgoCD Image Updater, Harbor Implemented
2 Harbor Registry GC Deadlock Harbor storage, GC jobs Implemented
3 ArgoCD Sync Wave Deadlock ArgoCD sync waves, PreSync hooks Implemented
4 Cascading CI/CD Pipeline Failures Gitea Runner, Harbor creds, ArgoCD, disk space Implemented
5 Deployment Rollout Failures Deployments, security contexts, quotas Implemented
6 Breaking CI/CD Pipeline Gitea Actions tagging, Harbor permissions, ArgoCD updater Approved
7 GitOps Image Update + Harbor Auth Image Updater, Harbor tokens, Helm values Approved
8 Broken GitOps Image Promotion Harbor webhooks, Image Updater auth Yellow
9 ArgoCD GitOps Sync Loop Mutating webhook, KEDA conflict, Helm values Approved
10 GitOps Drift That Survives Every Sync Admission controllers, image automation Approved
11 Gitea Webhook Amplification Gitea webhooks, ArgoCD, Harbor jobservice Yellow
12 The Broken Delivery regcred secret, Helm registry override, CI error masking Pending
13 GitOps Canary Rollouts Migration ArgoCD ApplicationSets, Argo Rollouts, Istio Pending

Appendix B: Why Resource Exhaustion Is Closed

8 approved tasks and ideas cover Kubernetes resource management:

# Task/Idea Component Focus Status
1 Single-Node Chaos Hardening Node memory pressure, eviction, scheduling Implemented
2 Chaos Engineering Resilience Chaos Mesh, pod-kill, network latency, CPU stress Implemented
3 Resource Quota Deadlocks ResourceQuotas, LimitRanges, PVC quotas Approved
4 Deployment Rollout Failures Resource quotas, security contexts Implemented
5 Zombie Process PID Exhaustion PID limits, init system, process reaping Yellow
6 Node Operations — Eviction Mirage Node drain, PDB, readiness timing Yellow/rejected
7 Admission Webhook Cascade Webhooks, CRD versioning, KEDA finalizers Approved
8 Autoscaler Quota Spiral KEDA, HPA, ResourceQuotas Implemented

Appendix C: Why Connecting CI/CD + Resources Doesn't Work

Warning

A coherent causal chain connecting CI/CD to resource exhaustion (e.g., "CI storm causes node pressure which evicts critical services") still fails overlap review because each link in the chain is individually claimed by an existing task. Reviewers evaluate overlap at the component × failure-mode level, not at the narrative level.

Six specific constructions were tested:

# Proposed Chain Why It Fails
1 CI storm → node pressure → critical service eviction CI storm = Cascading CI/CD (#4). Node pressure + eviction = Single-Node Chaos (#1).
2 Harbor GC → storage exhaustion → CI blockage Harbor GC = Harbor GC Deadlock (#2). CI blockage from registry = GitOps Pipeline Repair (#1).
3 Webhook amplification → ArgoCD CPU spike → reconciliation failure Webhooks = Gitea Webhook Amplification (#11). ArgoCD failure = ArgoCD Sync Loop (#9).
4 ResourceQuota too tight → deploys fail → CI hangs Quotas = Resource Quota Deadlocks (#3 resource) + Deployment Rollout Failures (#5 CI/CD).
5 KEDA autoscaling → quota ceiling → cascade KEDA + quota = Autoscaler Quota Spiral (#8 resource).
6 Image pull failures → pod churn → memory pressure → eviction Image pulls = GitOps Pipeline Repair (#1 CI/CD). Eviction = Single-Node Chaos (#1 resource).

Components that appear unclaimed (Docker daemon, containerd, etcd) require root access that agents don't have. Components that are unclaimed but narrow (Trivy scanning, Harbor replication, inode exhaustion) can't sustain a 4-hour horizon.


Appendix D: Recent Examples

These illustrate the overlap problem in practice:

  • "The Repository Knot" — A well-constructed four-layer Gitea failure scenario (default branch switch, connection pool exhaustion, mirror sync overwrites, webhook deadlock). The nebula-reviewer bot flagged 86-88% overlap across three different framings. Rejected.

  • "Harbor CI/CD Pipeline Resource Cascade Failure" — A seven-issue cascade across CI/CD and resource exhaustion. Despite multiple attempts to narrow scope and create a "coherent causal chain," every construction overlapped with 2-5 existing tasks. The author was redirected to explore alternative domains.

  • "The Operator Takeover" — Originally "deploy CloudNativePG + PgBouncer + WAL archiving." Overlapped with PostgreSQL HA + PgBouncer (Patroni). Successfully reframed to focus on live migration execution — a distinct operation category. Approved after reframing.


Methodology

This advisory is based on a comprehensive analysis of all approved tasks, implemented tasks, and pending ideas across both #task-idea-feedback and #task-feedback channels, cross-referenced against the full Nebula infrastructure inventory.

Detailed supporting analysis:


Companion Documents

  • Framing Guide — How to frame ideas that survive overlap review, with real before/after case studies
  • Overlap Review Calibration Guide — For reviewers: how to evaluate ideas consistently, interpret bot output, and give constructive feedback

Last updated: 2026-02-18. If you're reading this more than a few weeks after this date, check with reviewers — new tasks may have filled some of these gaps.

Framing Task Ideas to Survive Overlap Review

Companion to the Task Authoring Advisory. The advisory tells you where to write tasks. This guide helps you frame ideas — especially when working near saturated areas.

See also: Overlap Review Calibration Guide — so you understand how reviewers think.


What Reviewers Actually Evaluate

Reviewers assess overlap on a hierarchy of criteria, not just surface-level component matching. Understanding this hierarchy is the difference between a first-try approval and three rounds of iteration.

Priority What Reviewers Check Weight
1 Investigation topology / skills tested ██████████ Highest
2 Components touched + verification end-state ███████░░░
3 Operation category (troubleshoot vs greenfield vs migration) █████░░░░░
4 Root cause / failure mechanism ██░░░░░░░░ Lowest

Investigation topology is the diagnostic path the agent walks: what it checks, in what order, and what tools it uses to verify. Two tasks can have completely different root causes but still overlap if the agent walks the same pipeline, checks the same components, and verifies the same end state.

"While the root cause domain differs, the investigation topology is nearly identical — the agent walks the same pipeline, checks the same components, and verifies the same end state."

The auto-complete test: If completing Task A would auto-complete part of Task B, they overlap too much — regardless of how different the narratives sound.

What This Means in Practice

  • "Different root cause, same pipeline" fails. A CI/CD task broken by a DNS issue and a CI/CD task broken by a credential issue both have the agent walking the same pipeline stages. Different root cause, same topology = overlap.
  • "Same components, different K8s layer" can pass. A task involving ArgoCD at the admission control layer is fundamentally different from one involving ArgoCD at the sync/reconciliation layer — even though the component name appears in both.
  • Renaming alone never helps. The reviewer bot compares descriptions, not titles. Changing "Harbor CI/CD Cascade" to "Registry Resource Cascade" changes nothing if the described investigation is the same.

Task Types Matter

Not all task types receive equal scrutiny. Understanding this helps you frame ideas that land well.

Greenfield ◄──────────────────────────────────► Troubleshooting
  ⚠️ Extra scrutiny       Hybrid 🟢              ✅ Preferred
                        Migration 🟢

Troubleshooting (Preferred)

Something is broken. The agent must diagnose the problem, identify the root cause, and fix it. This is the most accepted category because it tests diagnostic reasoning — the core skill being evaluated.

Hybrid: Troubleshooting + Implementation (Sweet Spot)

Something was attempted but is broken or incomplete. The agent must both diagnose what went wrong AND complete the implementation correctly. This is the sweet spot because it tests diagnostic skills while also requiring domain knowledge to finish the work.

Migration (Accepted)

Move from state A to state B under production constraints (live traffic, data integrity, zero downtime). Migrations are a distinct operation category — the Operator Takeover is the canonical example.

Greenfield (Extra Scrutiny)

Build something from scratch. Greenfield tasks face the most scrutiny because reviewers question whether pure build tasks test the right skills. If you're proposing a greenfield task, expect pushback.

The preferred reframe for greenfield: "It was attempted but is broken." Instead of "deploy X from scratch," describe a scenario where someone started deploying X, got partway through, and left behind a broken state the agent must diagnose and complete.

Example — the Coordinated Backup reframe:

Before After
"Build a coordinated backup pipeline across PostgreSQL, MongoDB, and MinIO" "Discover that existing CronJobs meant to coordinate backups across data services are broken — timing races, credential staleness, and missing lifecycle hooks cause silent backup failures"

The reframe changes the operation from greenfield (build) to troubleshooting hybrid (diagnose existing broken coordination + fix it). Same domain knowledge required, but the task now tests diagnostic skills.


The Reframing Playbook

Real case studies from Discord, each showing how a reframe changed the outcome.

Successful Reframes

Case Original Framing Reframed To Why It Worked
Operator Takeover "Deploy CloudNativePG + PgBouncer + WAL archiving" "Live-migrate production databases from manual PostgreSQL to operator-managed under traffic" Changed operation category from greenfield deploy to live migration — distinct skills tested
Admission Webhook Cascade Originally pivoted from a Harbor cascade idea "API admission control failures cascade through KEDA, Istio, and ArgoCD CRDs" Changed K8s layer — same components but the challenge is at the admission control layer, not the workload layer
Shadow Shard Protocol "5-layer adversarial attack compromising cluster networking" "A forgotten diagnostics tool left behind by a previous admin is corrupting networking" Added operational realism, changed the K8s subsystem from abstract adversarial to concrete networking diagnostics
Partial Canary Disaster "10+ independent broken things across the deployment pipeline" "6-issue causal cascade where each failure triggers the next" Made issues chain causally instead of being independent — a causal cascade is a single investigation, not 10 separate tasks
Coordinated Backup "Build a backup pipeline across data services" "Discover broken coordination in existing CronJobs causing silent backup failures" Greenfield → troubleshooting hybrid — same domain, completely different skills tested

Anti-Patterns: Reframes That Failed

Case What Was Tried Why It Failed
Harbor CI/CD Cascade Three iterations narrowing scope from 7 issues to 4, renaming from "Harbor" to "Registry", trying different component combinations Every iteration still had the agent walking the same CI/CD pipeline. Narrowing scope doesn't help when the investigation topology is unchanged.
Database Partition Recovery Renaming "Split-Brain Recovery" to "Database Partition Recovery" and arguing different skills (distributed systems vs. security/GitOps) Renaming without changing the underlying investigation. The bot continued flagging 83-84% overlap. Reviewer: "The idea reviewer is practically telling you there are other similar ideas."
Progressive Canary Implementing Argo Rollouts canary as a distinct operation from existing rollout tasks Bot scored 89% overlap. Reviewer: "Both implement Argo Rollouts canary. Bharat's is a superset with additional complexity." A strict subset of an existing approved task.

The Pattern

Successful reframes change what the agent does, not what it encounters. Each success above changed either:

  • The operation category (greenfield → migration, greenfield → troubleshooting)
  • The K8s layer (workload orchestration → admission control)
  • The investigation topology (independent issues → causal chain, adversarial → operational)

Failed reframes changed the narrative while leaving the investigation path unchanged.


Working in Mostly Saturated Areas

The advisory's Remaining Angles table lists narrow openings in saturated areas. If you're targeting one of these, here's how to claim it convincingly.

Lead with What's Distinct, Not What's Similar

When proposing an idea in a saturated area, don't start with the shared components — start with what makes your investigation path different.

Bad opener: "This task involves Prometheus and Grafana, but focuses on SLO burn rates." → Reviewer immediately thinks "another Prometheus task" and looks for overlap.

Good opener: "The agent must implement SLO burn-rate alerting methodology — multi-window calculations, error budget policies, and alert routing based on burn velocity. This requires Prometheus but the investigation is mathematical/methodological, not infrastructure troubleshooting." → Reviewer sees a distinct skill being tested before checking component overlap.

Component Mentions vs. Component Focus

Reviewers assess component focus, not component mentions. A task that mentions Harbor in passing (because images need to exist somewhere) is different from a task where Harbor operations are the primary challenge.

When your task touches a saturated component incidentally:

  • Make clear in the description that it's a dependency, not the challenge
  • Emphasize what the agent actually spends time investigating
  • If most of the diagnostic work involves the saturated component, the mention-vs-focus distinction won't save you

When to Reframe vs. When to Pivot

Bot flags overlap
       │
       ▼
What did it match against?
       │
  ┌────┴────┐
  │         │
Same      Same components,
skills?   different skills?
  │         │
  ▼         ▼
Pivot     Reframe
  • 70%+ overlap at the topology level → Start fresh. The Harbor CI/CD Cascade went through three iterations without escape because the fundamental investigation path was claimed.
  • 50-70% overlap, different skills → Reframe. The Operator Takeover started at high overlap but escaped by changing the operation category.
  • Component keyword overlap only → Adjust your description to emphasize the distinct investigation. The Copa Airgap Patching idea scored 82% on the bot but was fast-tracked because the tool AND approach were genuinely novel.

Interpreting the nebula-reviewer Bot

The bot is a useful first filter, but it's not the final word. Understanding what it does — and doesn't — evaluate helps you respond effectively.

What the Bot Does

  • Compares your idea description against all existing task and idea descriptions
  • Uses semantic similarity, not keyword matching (but keywords heavily influence the score)
  • Returns a percentage overlap score against the closest matching existing tasks

What the Bot Doesn't Do

  • Evaluate investigation topology (the most important criterion)
  • Distinguish between component mentions and component focus
  • Assess whether the skills tested are genuinely different
  • Consider operation category differences

Reading Bot Scores

Score Range What It Means What to Do
85%+ Strong keyword/description overlap Check what it matched against. If the matched tasks test the same skills, pivot. If they're at a different layer, explain the distinction.
70-85% Moderate overlap detected Look at the specific tasks flagged. Is the overlap structural (same investigation) or superficial (shared components)?
Below 70% Low bot concern Don't relax — human reviewers catch structural similarity the bot misses. Still verify your investigation topology is distinct.

Real Bot Score Outcomes

  • Copa Airgap Patching: 82% → Approved. High bot score driven by container/registry keywords, but the actual investigation (Copa tooling, air-gap patching workflow) was genuinely novel. Bot couldn't distinguish.
  • Repository Knot: 86-88% → Rejected. Bot was right. Three different framings all had the agent walking the same Gitea investigation path.
  • Progressive Canary: 89% → Rejected. Bot correctly identified this as a strict subset of an existing task.

When the bot flags you: Don't panic, and don't just rename things. Check what the bot matched your idea against, then assess whether the overlap is at the component level (potentially salvageable) or the investigation topology level (pivot needed).


Summary

  1. Topology over narrative. What the agent does matters more than the story around it.
  2. Reframe the operation, not the description. Change the category of work, not just the words.
  3. Lead with distinction. Tell reviewers what's unique before they find what's similar.
  4. Trust the hierarchy. Skills tested > components touched > operation type > root cause.
  5. Use the bot wisely. It catches keyword overlap. You need to assess skill overlap.

This guide is part of a three-document set:

Last updated: 2026-02-17

Overlap Review Calibration Guide

For human reviewers evaluating task ideas in Discord. The bot gives you candidates; this guide helps you evaluate them consistently.

See also: Task Authoring Advisory — the landscape overview authors use, and Framing Guide — what authors are told about framing.


The Assessment Framework

Overlap review evaluates whether two tasks test the same skills via the same diagnostic path — not whether they mention the same components or tell a similar story.

Investigation Topology Is the Primary Signal

The most important question: Does the agent walk the same diagnostic path?

Two tasks that look different on the surface can have identical investigation topology:

Task A: "CI storm breaks pipeline"     Task B: "Harbor GC breaks pipeline"
  1. Check Gitea Actions logs             1. Check Harbor storage
  2. Check Harbor push/pull    ◄─same──►  2. Check Harbor GC/blobs
  3. Check ArgoCD sync status  ◄─same──►  3. Check ArgoCD sync status
  4. Verify deployment works   ◄─same──►  4. Verify deployment works
                    │
          Same investigation topology
          = blocking overlap

The root causes are different (CI overload vs. garbage collection), but the agent exercises the same skills: reading CI logs, checking registry state, verifying GitOps sync, and confirming deployment. This is blocking overlap.

The Auto-Complete Test

If completing Task A would auto-complete part of Task B, they overlap too much. This applies even when:

  • The tasks have different root causes
  • The tasks target different components
  • The narrative framing is distinct

Causal Chains Don't Defeat Individual Overlap

A "coherent causal chain" connecting multiple individually-claimed components still fails. If link 1 is claimed by Task X and link 2 is claimed by Task Y, chaining them together as "link 1 causes link 2" doesn't create new territory. See Appendix C of the advisory for six specific constructions that were tested and failed.


When Structural Similarity Is Acceptable vs. Blocking

Scenario Verdict Reasoning
Same pattern, different technology (MongoDB HA vs PostgreSQL HA) ✅ Generally acceptable Different tools, failure modes, and domain knowledge required
Different tool, same operation (CloudNativePG deploy vs Patroni deploy) ❌ Blocking Same skills tested, same verification end-state
Same components, different K8s layer (admission control vs workload scheduling) ✅ Acceptable Fundamentally different diagnostic reasoning required
Novel narrative connecting individually claimed components ❌ Blocking Each link overlaps individually — the chain doesn't create new skills
Genuinely novel paradigm (Copa air-gap patching) ✅ Acceptable Both the tool AND the approach are new — no existing task exercises these skills
Same investigation, different root cause ❌ Blocking Root cause has the lowest weight in the evaluation hierarchy
Same components, different operation category (troubleshoot vs migrate) ✅ Generally acceptable Different skills: diagnostic reasoning vs. migration execution under constraints

Interpreting Bot Output

The nebula-reviewer bot compares description text semantically. It's a useful filter but not a decision-maker.

When to Trust the Bot

High scores (80%+) where the matched tasks share investigation topology — the bot is confirming what you'd find on manual review. Examples:

  • Repository Knot (86-88%): Bot correctly flagged that all three framings walked the same Gitea investigation path.
  • Progressive Canary (89%): Bot correctly identified a strict subset of an existing task.

When to Override the Bot

High scores driven by component keyword matching rather than skill overlap. The bot can't distinguish between "this task mentions Harbor" and "this task's primary challenge involves Harbor operations."

  • Copa Airgap Patching (82%): High score from container/registry keywords, but the actual investigation (Copa tooling, vulnerability patching in air-gapped environments) was genuinely novel. Approved and fast-tracked.
  • GDPR Data Erasure (82%): High score from database keywords, but the operation category (compliance-driven data lifecycle) was novel. Approved.

When the Bot Misses Things

Low bot scores don't guarantee no overlap. The bot misses structural similarity — two tasks with different descriptions that have the agent walk the same diagnostic path. Always check:

  • Would the same debugging skills solve both tasks?
  • Does the agent check the same components in the same order?
  • Would completing one task teach you everything needed for the other?

Your role as reviewer: The bot identifies candidates. You evaluate whether the overlap is at the skill/topology level or just the component/keyword level.


Calibration Examples

Reference cases showing how bot scores, verdicts, and reasoning align (or don't).

Idea Bot Score Verdict Key Factor
Copa Airgap Patching 82% ✅ Approved Genuinely novel paradigm — tool AND approach are both new to the task set
GDPR Data Erasure 82% ✅ Approved Novel operation category — compliance-driven data lifecycle vs. database troubleshooting
Admission Webhook Cascade 82% ✅ Approved Different K8s layer — admission control vs. workload orchestration, despite touching same components
ProxySQL 84% ✅ Approved (borderline) Different tool with genuinely different capabilities — connection routing vs. simple pooling
Repository Knot (Gitea) 86-88% ❌ Rejected Same investigation topology across three different framings — agent walks the same Gitea path
Harbor CI/CD Cascade ~85% ❌ Rejected Same pipeline, same end-state — three iterations couldn't escape the overlap
Database Partition Recovery 83-84% ❌ Rejected Renaming without substance change — investigation was still PostgreSQL HA troubleshooting
Progressive Canary 89% ❌ Rejected Strict subset of existing GitOps Canary Rollouts task — every skill tested was already covered

Pattern: Bot scores in the 82-89% range can go either way. The deciding factor is never the score itself — it's whether the investigation topology is genuinely distinct.


Giving Constructive Feedback

Five named patterns from effective reviewer behavior. Match your approach to the situation.

Choosing Your Approach

Overlap severity?
       │
  ┌────┼────────┐
  │    │        │
>70%  50-70%   <50%
  │    │        │
  ▼    ▼        ▼
"Start  Detailed  Incremental
fresh"  analysis  stacking
        + open
        areas

1. Detailed Analysis + Open Area Guidance

When to use: Author has a reasonable idea that overlaps in a fixable way.

Check all existing threads in the matched area, explain specifically what the topology overlap is, then proactively list viable alternative angles with concrete title suggestions. Don't just say "this overlaps" — show the author where the open space is.

Example pattern: "Your idea overlaps with [specific tasks] because the agent walks the same path: check logs → verify sync → confirm deployment. However, if you shifted the challenge to the admission control layer — something like 'Webhook validation failures cascade through CRD updates' — that's a genuinely different investigation."

2. Pragmatic "Start Fresh"

When to use: Overlap is 70%+ at the topology level. Iteration won't help.

Be direct and efficient. When the overlap is deep and structural, extended analysis wastes both your time and the author's. A clear "this area is fully claimed, here are open areas instead" is more helpful than a detailed breakdown of why each component overlaps.

Example pattern: "This is deep in claimed territory — the investigation path matches [existing tasks]. Rather than iterating, I'd suggest looking at [open areas from the advisory]. Those have clean surface area."

3. Teaching Self-Assessment

When to use: Author would benefit from learning to evaluate overlap themselves.

Instead of providing the answer, ask the author to search the forum for their key components and assess what they find. This builds the author's calibration for future ideas.

Example pattern: "Before I review this, search the ideas forum for [key component]. Look at what's already there and assess whether your investigation path is distinct from those. What do you think?"

4. Concrete Reframe Offer

When to use: The core idea has potential but the framing is wrong (typically greenfield that should be troubleshooting).

Don't just say "make it troubleshooting" — show what that looks like. Provide a complete alternative framing so the author sees the shape of a viable version.

Example pattern: "Instead of 'deploy X from scratch,' consider: 'A previous admin attempted to set up X but left it in a broken state — the agent finds misconfigured [specific things] and must diagnose the failures while completing the implementation.' Same domain knowledge, but now it tests diagnostic skills."

5. Incremental Stacking

When to use: The idea is close but needs more differentiation to clear the bar.

Ask the author to stack additional functionality or constraints that push the investigation into unclaimed territory. Works when the base idea is 50-70% distinct.

Example pattern: "The core is close but overlaps with [task] on the [specific] steps. Can you add [specific additional challenge] that forces a different investigation path? That would push this into distinct territory."


Task Type Guidance for Reviewers

When to Suggest Troubleshooting Mode

Suggest it when:

  • The idea is a pure greenfield build with no diagnostic component
  • The skills tested would be identical whether the thing exists or not
  • Adding "it was attempted but is broken" would genuinely change the investigation

Don't suggest it when:

  • The idea is a migration (migration is a valid non-troubleshooting category)
  • The greenfield aspect requires genuinely novel integration knowledge
  • The author has already incorporated diagnostic elements

How to Provide a Concrete Reframe

Bad: "Can you make this troubleshooting instead of greenfield?" → Author doesn't know what you mean. They'll rename it and resubmit.

Good: "Instead of 'set up KEDA autoscaling for Bleater services,' consider: 'KEDA ScaledObjects are deployed but triggers aren't firing — TriggerAuthentication is misconfigured for RabbitMQ, the HPA conflict causes oscillation, and scale-to-zero leaves services unresponsive.' Same KEDA knowledge required, but now the agent diagnoses failures instead of building from scratch."

Hybrid as a Path Forward

When an author is stuck between greenfield and troubleshooting, suggest hybrid: "Someone started this, got partway through, and left behind a partially-working-but-broken state." This preserves the implementation knowledge the author wants to test while adding the diagnostic component reviewers want to see.


Summary

  1. Topology first. Same diagnostic path = overlap, regardless of narrative differences.
  2. Bot scores are inputs, not decisions. 82% can mean "approve" or "reject" depending on whether the overlap is structural or superficial.
  3. Match feedback to severity. Deep overlap → direct redirect. Moderate overlap → detailed analysis. Light overlap → incremental stacking.
  4. Show, don't tell. Concrete reframes beat abstract advice. "Make it troubleshooting" is not actionable; a complete alternative framing is.
  5. Consider all operation categories. Troubleshooting is preferred, but migration, hybrid, and well-scoped greenfield are all valid.

This guide is part of a three-document set:

  • Task Authoring Advisory — landscape overview and opportunity areas
  • Framing Guide — what authors are told about framing ideas
  • Overlap Review Calibration Guide (this document) — how to evaluate ideas consistently

Last updated: 2026-02-17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment