| name | description |
|---|---|
Pipeline Investigation |
Debug GitLab CI/CD pipeline failures using glab CLI. Investigate failed jobs, analyze error logs, trace child pipelines, and compare Node version differences. Use for pipeline failures, job errors, build issues, or when the user mentions GitLab pipelines, CI/CD problems, specific pipeline IDs, failed builds, or job logs. |
Use this skill when investigating GitLab CI/CD pipeline issues.
- User reports pipeline failures (e.g., "Pipeline #2961721 failed")
- Questions about job failures or CI/CD errors
- Investigating UI test failures
- Analyzing job logs or error messages
# Get current branch
BRANCH=$(git branch --show-current)
# Find latest failed pipeline for this branch
glab api "projects/2558/pipelines?ref=$BRANCH&status=failed&per_page=3" | jq '.[] | {id, status, created_at}'
# Or for specific branch
glab api "projects/2558/pipelines?ref=feat/node20-migration&status=failed&per_page=3" | jq '.[0]'
# For merge request pipelines, use the MR ref format:
glab api "projects/2558/pipelines?ref=refs/merge-requests/<MR_ID>/head&per_page=3" | jq '.[] | {id, status, created_at}'
# Find latest pipelines (any status) for current branch
glab api "projects/2558/pipelines?ref=$BRANCH&per_page=5" | jq '.[] | {id, status, created_at}'Start with step 1 below.
# Quick check if pipeline exists and get basic status
glab api "projects/2558/pipelines/<PIPELINE_ID>" | jq -r '.status // "Pipeline not found"'
# Get full pipeline status and metadata
glab api "projects/2558/pipelines/<PIPELINE_ID>" | jq '{status, ref, created_at, duration, web_url}'
# Verify pipeline has jobs (old pipelines may be cleaned up)
glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | jq '. | length'
# If returns 0, pipeline data is unavailable - try a more recent oneALWAYS use --paginate when getting jobs (pipelines have 80+ jobs):
# Get ALL failed jobs
glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | jq -r '.[] | select(.status == "failed") | "\(.name) - Job \(.id)"'# Get last 100 lines of job log (capture stderr with 2>&1)
glab ci trace <job-id> 2>&1 | tail -100
# Search for errors
glab ci trace <job-id> 2>&1 | grep -E "error|Error|failed|FAIL"Jobs like UI Tests and Deploy trigger child pipelines. Always check bridges:
# Find child pipelines
glab api "projects/2558/pipelines/<PIPELINE_ID>/bridges" | jq '.[] | {name, status, child: .downstream_pipeline.id}'
# If child pipeline exists, get its jobs
glab api "projects/2558/pipelines/<CHILD_PIPELINE_ID>/jobs" --paginate | jq -r '.[] | "\(.name) | \(.status) | Job \(.id)"'# Step 1: Find failed child pipeline
CHILD_ID=$(glab api "projects/2558/pipelines/<PIPELINE_ID>/bridges" | jq -r '.[] | select(.status == "failed") | .downstream_pipeline.id')
# Step 2: Get failed jobs from child pipeline
glab api "projects/2558/pipelines/$CHILD_ID/jobs" --paginate | jq -r '.[] | select(.status == "failed") | "\(.name) - Job \(.id)"'
# Step 3: Get one job's log (they're usually identical)
glab ci trace <job-id> 2>&1 | tail -100When many jobs fail (e.g., all Image builds), check ONE representative job first - they often have identical errors.
# Get first failed job
FIRST_FAILED=$(glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | jq -r '.[] | select(.status == "failed") | .id' | head -1)
# Check its log
glab ci trace $FIRST_FAILED 2>&1 | tail -100
# If needed, check if error is identical across all failed jobs
glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | \
jq -r '.[] | select(.status == "failed") | .id' | head -3 | while read job_id; do
echo "=== Job $job_id ==="
glab ci trace $job_id 2>&1 | grep -E "ERROR|Error:|error:" | head -5
done- Always use --paginate for job queries (pipelines have 80+ jobs)
- Always capture stderr with
2>&1when getting logs - Always check for child pipelines via bridges API
- Limit log output to avoid overwhelming context (use
tail -100orhead -50) - Use project ID 2558 explicitly (never rely on context)
- ❌ Forgetting
--paginate(only gets first 20 jobs) - ❌ Not checking child pipelines (missing UI Test/Deploy jobs)
- ❌ Confusing Pipeline IDs (~2M) with Job IDs (~20M+)
- ❌ Missing stderr output (forgetting
2>&1) - ❌ Dumping entire logs (use tail/head/grep)
- ❌ Investigating old pipelines with no jobs (check job count first)
When analyzing logs, look for these signatures:
Missing Docker Image:
manifest for <image> not found: manifest unknown
→ Base runner image not available in ECR (common during Node version transitions)
BundleMon Credentials:
bad project credentials
{"message":"forbidden"}
→ BundleMon service access issue (doesn't fail the build, but shows in logs)
Build Timeout:
ERROR: Job failed: execution took longer than <time>
→ Checkout server builds can take 44+ minutes (known issue)
Test Failures:
FAIL <test-name>
Expected: <value>
Received: <value>
→ Unit test assertion failure (check test logs for specifics)
Load these files as needed for detailed information:
cli-reference.md- Complete glab command syntax, API patterns, jq examples, and advanced queriespipeline-stages.md- Stage dependencies, timing, critical paths, and optimization strategiesjob-catalog.md- Full job descriptions, configurations, durations, and dependencies (all 80+ jobs)
- Project ID: 2558
- GitLab Instance:
- Repository:
- Typical Pipeline: 80+ jobs across 12 stages
- Common Child Pipelines: UI Tests, Deploy
- Install And Build: Install, WebRunner, Wiremock, Reportportal_Setup
- Static Analysis: ESlint, Typescript, Format, Stylelint
- Test: UnitTests:Main, UnitTests:App, UnitTests:Checkout, UnitTests:Utils, UnitTests:Miscellaneous, Sonar
- Image: Create:Image:{banner}:kits:bbm:app, Create:Image:{banner}:kits:checkout:server, Create:Image:{banner}:kits:pim
- Banners: bquk, bqie, tpuk, cafr, capl