Skip to content

Instantly share code, notes, and snippets.

@dariuszparys
Created March 4, 2026 05:57
Show Gist options
  • Select an option

  • Save dariuszparys/c25004db7d3976bb7640a793a2f15dfd to your computer and use it in GitHub Desktop.

Select an option

Save dariuszparys/c25004db7d3976bb7640a793a2f15dfd to your computer and use it in GitHub Desktop.
Log Triage Playbook
name description
log-triage-playbook
Structured triage for long build, deployment, and runtime log threads. Use when users paste large logs and ask for root-cause analysis, recovery steps, or troubleshooting guidance. Produce a fixed response with ranked root causes, exact verify commands, minimal patch, and rollback.

Long Log Triage Playbook

Use this skill to turn noisy failure logs into a repeatable triage result.

Capture Inputs

Capture before analysis:

  • Failing command, pipeline/job/stage, and environment (dev/test/prod)
  • First failing timestamp and first hard error line
  • Relevant context window around the first hard error (about +/- 40 lines)
  • Recent change context (branch, commit, PR, or work item)

If key inputs are missing, state exactly what is missing and continue with best-effort analysis.

Triage Workflow

  1. Find the first hard failure.
    • Prioritize fatal markers (ERROR, Exception, non-zero exit) over downstream noise.
  2. Separate primary failure from cascade failures.
    • Treat repeated follow-up errors as symptoms unless they have independent evidence.
  3. Build and rank root-cause hypotheses.
    • Rank by evidence strength and blast radius.
  4. Add deterministic verification.
    • Give copy/paste commands for each top hypothesis.
  5. Propose the smallest safe fix.
    • Prefer one minimal patch over broad refactors.
  6. Provide rollback or containment.
    • Include the fastest safe fallback if the patch fails.

Output Contract

Return this structure in order:

Triage Summary

  • Failure point: first hard error (with line/time reference if available)
  • Likely scope: component/service affected
  • Risk: low/medium/high

Ranked Root Causes

Rank Hypothesis Evidence Confidence
1 ... ... High/Medium/Low
2 ... ... High/Medium/Low
3 ... ... High/Medium/Low

Verify Commands

Provide exact commands, one block per hypothesis:

# Verify hypothesis 1
...
# Verify hypothesis 2
...

Minimal Patch

  • Smallest code/config change to address the top verified hypothesis
  • Affected files/components
  • Why this is minimal

Rollback / Containment

  • Immediate fallback action
  • Command or deployment action to revert safely
  • Condition for triggering rollback

Decision Gate

  • If verify #1 is true -> apply minimal patch
  • If verify #1 is false and #2 is true -> apply fallback patch for #2
  • If all verifies are false -> request the next exact data slice needed

Guardrails

  • Do not claim certainty without log evidence.
  • Do not skip rollback guidance.
  • Do not recommend large refactors during incident triage.
  • Keep commands environment-aware and explicit (project, resource group, namespace, branch).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment