Skip to content

Instantly share code, notes, and snippets.

@cnolanminich
Created March 9, 2026 15:23
Show Gist options
  • Select an option

  • Save cnolanminich/60f1e349061e243962054ef31b02cf21 to your computer and use it in GitHub Desktop.

Select an option

Save cnolanminich/60f1e349061e243962054ef31b02cf21 to your computer and use it in GitHub Desktop.
skills evaluation with and without SE skills

Project & Session Comparison

Project Comparison: testing-new-skills vs testing-new-skills-2

Structural Differences

Aspect Project 1 (testing-new-skills) Project 2 (testing-new-skills-2)
dbt project location Inside src/.../defs/dbt_project/ Top-level dbt_project/
dbt mart models account_360, pipeline_summary, lead_conversion_funnel fct_sales_pipeline, dim_account_health, fct_lead_conversion
Custom components 5 (incl. ScheduledJobComponent) 4 (no scheduling component)
dlt approach Subclasses DltLoadCollectionComponent (built-in) Fully custom component with inline dlt.pipeline()
Demo mode All components have demo_mode: bool toggle No demo mode — generates data directly
GCloud Function 3 assets per function (execution/status/result) + sensor triggers jobs 2 asset groups (orchestrated/observed) + observation sensor
Google Drive Supports transform queries in config Direct DuckDB reads, no transform layer
Schedules 3 scheduled jobs (daily ingestion, daily Snowflake, weekly Drive) None
Salesforce pipeline Separate salesforce_pipeline.py with dlt source definition Data generation embedded in component class
Snowflake share Has ShareTarget model, demo mode logging Direct implementation, no demo mode

dbt Model Differences

The mart models take different analytical angles:

  • Project 1: account_360 (full account view), pipeline_summary (stage/fiscal aggregation), lead_conversion_funnel (source/industry conversion rates)
  • Project 2: fct_sales_pipeline (opportunity-level with deal tiering), dim_account_health (health scoring with Platinum/Gold/Silver/Bronze tiers), fct_lead_conversion (individual lead funnel tracking)

Project 2's models are more granular (row-per-entity), while Project 1's are more aggregated (summary metrics).


Session Comparison

Metric Project 1 Session Project 2 Session
Session ID 66a81ae4 ee3d6fea
Total tokens ~15.9M ~5.0M (3.2x fewer)
Output tokens 46,745 28,117
Cache read tokens 15.2M 4.8M
Messages 210 104
Duration 3 days (Mar 6-9, with revisits) ~21 minutes (Mar 9)
Tool calls ~140 ~60
Errors/retries ~13 (import issues, pip not found, dlt API mismatches) 3 (path resolution, env vars, zsh glob)
Skill used dagster-demodagster-expert (orchestrator chain) dagster-expert directly

Approach Differences

Project 1 Project 2
Strategy Orchestrator skill (dagster-demo) provided a 5-step workflow; heavy upfront reference reading (~10+ docs), then systematic execution Direct dagster-expert skill invocation; targeted reference reads, then dove into building
API exploration Spent time introspecting dagster-dlt Python APIs at runtime (uv run python -c "from dagster_dlt import ...") to find correct imports Avoided built-in dlt component entirely — wrote custom component with raw dlt library
Component design Tried to subclass existing components (DltLoadCollectionComponent) — hit import/API issues Built all custom components from scratch using dg.Component base class
Error recovery Multiple cycles of dg check defs → fix imports → re-check Fewer errors; mostly env var / path issues, resolved quickly
Parallelism More sequential tool calls Aggressive parallel tool calls (scaffold 4 components simultaneously, read 5 files at once)

Key Takeaway

Project 2 was 3.2x more token-efficient and completed in ~21 minutes vs. spanning 3 days. The main factors:

  1. Avoided complex subclassing — writing fully custom components with raw libraries (dlt, snowflake-connector, google APIs) was simpler than trying to extend Dagster's built-in integration components
  2. Skipped the orchestrator skill — going directly to dagster-expert cut overhead
  3. More parallel tool calls — scaffolding 4 components in one shot vs. sequentially
  4. Fewer errors — simpler component architecture meant fewer import/API compatibility issues
  5. No demo mode abstraction — less code surface to debug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment