Skip to content

Instantly share code, notes, and snippets.

@cnolanminich
Last active March 12, 2026 20:08
Show Gist options
  • Select an option

  • Save cnolanminich/777e58d4f9c25f07d02b2ee8ce64271e to your computer and use it in GitHub Desktop.

Select an option

Save cnolanminich/777e58d4f9c25f07d02b2ee8ce64271e to your computer and use it in GitHub Desktop.
comparing with dagster-demo skill vs. without

Dagster Demo Skill: Session Comparison

Comparison of five Claude Code sessions that received the same prompt, with varying skill configurations and prompt refinements.

The Prompt

All sessions received essentially the same base prompt: create a demo Dagster project with Fivetran → dbt → Snowflake → PowerBI, Alteryx, Domo (migrating off) → Census/Fivetran Activations, with event-driven sensors and observe/orchestrate modes.

skills-10 received an enhanced prompt with additional explicit instructions: "Make sure any component that connects to an external system is using a state-backed component, uses a local cache and writes a set of mock assets using that cache, and that when it executes it logs a sample message and metadata instead of connecting to the external system. When modifying a component that exists, ALWAYS subclass, do not create a custom component."

Projects at a Glance

Aspect skills-6 skills-6-no-demo skills-7 skills-9 skills-10
Skills used dagster-demo + dagster-expert dagster-expert only dagster-expert only dagster-expert only dagster-expert only
Prompt Base Base Base Base Enhanced (explicit mock/subclass instructions)
Project name data_stack_demo analytics_orchestrator data_platform demo_data_stack data_platform
Total assets ~20 ~39 ~30+ ~20-25 ~28
Custom components 7 (3 subclass + 4 custom) 5 (all custom) 8 (2 subclass + 6 custom) 6 (3 subclass + 3 custom) 7 (3 subclass + 4 custom)
Jobs 6 + 4 scheduled 5 explicit 7 3 4
Sensors 2 asset sensors 4 run-status sensors 5 asset sensors + 1 schedule 5 asset sensors 2 orchestration + 2 observe
Schedules 4 via ScheduledJobComponent 1 daily at 6 AM 1 hourly (streaming checks) 0 0
Demo mode Yes — subclass overrides No — real state-backed Hybrid — demo fallback Yes — full mock Yes — DEMO_MODE=True toggle
Defs folders 3 12+ 13 7 8
Extra features Failure alert sensor GCP Dataflow (batch + streaming), asset checks Fan-out sensor, explicit demo toggle
Python files ~15 ~20+ 13 12 12
YAML files ~5 ~12 14 12 7

Detailed Project Differences

skills-6 (with dagster-demo skill)

  • Approach: Subclassed FivetranAccountComponent, DbtProjectComponent, PowerBIWorkspaceComponent to inject mock data; wrote 4 custom components (Census, Alteryx, Domo, ScheduledJob)
  • Structure: 3 defs folders (data_pipeline, orchestration_sensors, pipeline_sensors) — most consolidated layout
  • Orchestration: Scheduled jobs as fallback + asset sensors as primary chain
  • Data sources: 2 (Salesforce, NetSuite), 4 staging models, 2 mart models

skills-6-no-dagster-demo (no dagster-demo skill)

  • Approach: Used library components directly + 5 custom components from scratch (Alteryx, Census, Domo, FivetranActivation, PowerBIWithDbt)
  • Structure: 12+ defs folders (one per integration) — most granular layout
  • Orchestration: Single daily schedule at 6 AM + 4 run-status sensors chaining everything + failure alert sensor
  • Data sources: 3 (Salesforce, Stripe, HubSpot), 6 staging models, 4 mart models
  • Notable: Production-oriented, requires real credentials, most comprehensive data model

skills-7 (no dagster-demo skill, multi-session)

  • Approach: 2 subclassed components (Fivetran, dbt) + 6 custom (Dataflow Batch, Dataflow Streaming, Census, Alteryx, Domo, FivetranActivations)
  • Structure: 13 defs folders including 3 GCP Dataflow folders (batch, streaming observed, streaming orchestrated)
  • Orchestration: 5 asset sensors + 1 hourly schedule for streaming data checks
  • Unique: Added GCP Dataflow integration with bounded batch jobs, unbounded streaming (observed + orchestrated modes), and asset checks for streaming data quality (freshness, volume, schema drift)
  • Demo mode: Hybrid — demo fallback when credentials unavailable, real APIs when provided
  • Built across 2 main sessions (initial build + Dataflow addition)

skills-9 (no dagster-demo skill)

  • Approach: 3 subclassed components (FivetranIngest, PowerBI, dbt) + 3 custom (Alteryx, Domo, Census)
  • Structure: 7 defs folders with clean domain grouping (ingestion, transform, consumption, reverse_etl, orchestration)
  • Orchestration: Purely sensor-driven (0 schedules), 5 asset sensors chain the full pipeline
  • Demo mode: Full mock — all components hardcode demo credentials, [MOCK] log prefix on API calls
  • Notable: Cleanest folder structure, all StateBackedComponent subclasses with write_state_to_path() overrides

skills-10 (enhanced prompt, no dagster-demo skill)

  • Approach: 3 subclassed components (DemoFivetran, DemoDbt, DemoPowerBI) + 4 custom StateBackedComponent subclasses (Census, Alteryx, Domo, FivetranActivations)
  • Structure: 8 defs folders (fivetran_ingestion, dbt_transforms, powerbi, census, alteryx, domo, fivetran_activations, orchestration)
  • Orchestration: 2 orchestration sensors — fivetran_to_dbt_sensor (asset sensor) + dbt_complete_fan_out_sensor (run-status sensor that triggers 3 downstream jobs in parallel). 0 schedules.
  • Demo mode: Explicit DEMO_MODE: bool = True toggle per component; each component overrides write_state_to_path() with mock JSON and execute() with [MOCK] log messages
  • Data sources: 2 (Salesforce, Stripe), 6 staging + 3 mart dbt models
  • Notable: Most consistent component pattern — every component follows identical structure (write_state_to_path override + DEMO_MODE toggle). Fan-out sensor is the most elegant orchestration pattern across all projects. Highest edit count (17) — iterative refinement approach.
  • Prompt effect: The enhanced prompt's explicit instructions about state-backed components and subclassing produced results very similar to what the dagster-demo skill achieved in skills-6, suggesting the skill's guidance can be replicated with prompt engineering.

Session Metrics

Metric skills-6 (with skill) skills-6-no-demo skills-7 (main) skills-7 (cont.) skills-9 skills-10
Wall clock ~110 min ~80 min ~20 min ~15 min ~5 hrs (breaks) ~37 min
Total turn duration 19.1 min 33.6 min 5.9 min 9.2 min 6.9 min 13.5 min
User messages 169 269 95 104 133 103
Assistant turns 237 375 117 141 178 143
Output tokens 56,781 81,910 18,708 36,006 40,737 47,357
Cache read tokens 21.8M 30.9M 5.9M 8.4M 13.6M 10.3M
Cache write tokens 696K 1.08M 275K 307K 1.03M 848K
Total tool calls 165 261 90 100 130 100
File writes 33 64 21 24 41 24
File reads 34 74 32 36 38 36
Bash commands 79 101 28 28 29 21
Edits 15 5 7 5 6 17
Agents spawned 0 0 0 1 9 1
Skills invoked dagster-demo, dagster-expert dagster-expert dagster-expert dagster-expert dagster-expert dagster-expert

Aggregate Totals (combining multi-session projects)

Metric skills-6 skills-6-no-demo skills-7 (2 sessions) skills-9 skills-10
Output tokens 56,781 81,910 54,714 40,737 47,357
Cache read tokens 21.8M 30.9M 14.3M 13.6M 10.3M
Total tool calls 165 261 190 130 100
File writes 33 64 45 41 24

Path Taken

skills-6 (with dagster-demo skill)

  • Invoked the dagster-demo skill which provided a structured recipe for creating demo projects
  • Also used dagster-expert for Dagster-specific guidance
  • Focused on subclassing existing components to inject mock data
  • Fewer files written (33) — the skill guided a more consolidated structure
  • More edits (15) — iterative refinement of fewer files
  • Had 3 prior sessions exploring skill behavior before the main build

skills-6-no-dagster-demo (no skill)

  • Only used dagster-expert for general Dagster guidance
  • Built everything from scratch without a demo-mode recipe
  • Created nearly 2x the files (64) with more granular folder structure
  • Used 74 reads — extensive exploration/reference needed
  • Most total tokens consumed (~82K output)
  • Needed the most user interactions (269)

skills-7 (no skill, multi-session)

  • Split across 2 main sessions: initial pipeline build (20 min), then continuation adding more components (15 min)
  • First session built the core Fivetran → dbt → downstream pipeline
  • Second session added GCP Dataflow integration (batch + streaming) with asset checks — a unique feature not in other projects
  • Moderate token usage (~55K total output across both sessions)
  • Most efficient per-session turn duration (5.9 min + 9.2 min)

skills-9 (no skill)

  • Single long session with breaks (wall clock ~5 hours, but only 6.9 min of active turn time)
  • Heaviest use of subagents (9 Agent calls) — delegated research and exploration
  • Fewest total tool calls (130) and lowest output tokens (40.7K) — most efficient single-session build
  • Used WebFetch (3) and WebSearch (2) for documentation lookup
  • Produced the cleanest folder structure with the most consistent mock patterns

skills-10 (enhanced prompt, no skill)

  • Single session, 37 min wall clock, 13.5 min active turn time
  • Fewest tool calls overall (100) and fewest file writes (24) — most efficient build
  • Highest edit count (17) — wrote fewer files but refined them more iteratively
  • Only 21 bash commands — least shell usage of any session
  • The enhanced prompt's explicit instructions eliminated exploration overhead: no need to discover the right patterns through trial and error
  • Produced the most consistent component pattern across all 7 components
  • Fan-out orchestration sensor (1 sensor triggers 3 parallel jobs) is the most elegant design

Key Architectural Differences

  1. Demo vs Production: skills-6, skills-9, and skills-10 all created full mock/demo projects. skills-6-no-demo created production-oriented code. skills-7 was a hybrid with demo fallbacks.

  2. Component granularity: skills-6 most consolidated (3 folders), skills-6-no-demo most granular (12+), skills-7 expanded scope (13 including Dataflow), skills-9 and skills-10 cleanest middle ground (7-8 folders).

  3. Sensor approach: skills-6, skills-7, and skills-9 used asset materialization sensors. skills-6-no-demo used run status sensors. skills-10 uniquely combined both — an asset sensor for Fivetran→dbt plus a run-status fan-out sensor for dbt→downstream.

  4. Unique features: skills-7 was the only project to include GCP Dataflow integration and asset checks for streaming data quality. skills-6 was the only one with a ScheduledJobComponent. skills-6-no-demo was the only one with a failure alert sensor. skills-10 had the cleanest fan-out orchestration pattern.

Summary

Efficiency Scope Quality
skills-6 (with skill) Good — skill reduced exploration overhead Standard pipeline Clean but consolidated
skills-6-no-demo Worst — 82K tokens, 261 tool calls Most comprehensive (39 assets, 4 marts) Production-ready but verbose
skills-7 Good — 55K tokens across 2 sessions Extended scope (+ Dataflow, asset checks) Hybrid demo/production
skills-9 Good — 41K tokens, 130 tool calls Standard pipeline Cleanest structure and patterns
skills-10 (enhanced prompt) Best — 47K tokens, 100 tool calls, 24 writes Standard pipeline (~28 assets) Most consistent component pattern

The most efficient session was skills-10, which achieved the fewest tool calls (100) and file writes (24) through an enhanced prompt that explicitly specified architectural requirements. This suggests that embedding key design decisions directly in the prompt (state-backed components, subclassing, mock patterns) is more effective than relying on either skills or the model's own exploration. The dagster-demo skill (skills-6) achieved similar architectural outcomes but required more overhead to load and apply the skill's guidance. All five sessions produced working Dagster projects from the same base prompt, demonstrating that prompt specificity has the highest impact on both efficiency and output consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment