| Aspect | Project 1 (testing-new-skills) |
Project 2 (testing-new-skills-2) |
|---|---|---|
| dbt project location | Inside src/.../defs/dbt_project/ |
Top-level dbt_project/ |
| dbt mart models | account_360, pipeline_summary, lead_conversion_funnel |
fct_sales_pipeline, dim_account_health, fct_lead_conversion |
| Custom components | 5 (incl. ScheduledJobComponent) |
4 (no scheduling component) |
| dlt approach | Subclasses DltLoadCollectionComponent (built-in) |
Fully custom component with inline dlt.pipeline() |
| Demo mode | All components have demo_mode: bool toggle |
No demo mode — generates data directly |
| GCloud Function | 3 assets per function (execution/status/result) + sensor triggers jobs | 2 asset groups (orchestrated/observed) + observation sensor |
| Google Drive | Supports transform queries in config | Direct DuckDB reads, no transform layer |
| Schedules | 3 scheduled jobs (daily ingestion, daily Snowflake, weekly Drive) | None |
| Salesforce pipeline | Separate salesforce_pipeline.py with dlt source definition |
Data generation embedded in component class |
| Snowflake share | Has ShareTarget model, demo mode logging |
Direct implementation, no demo mode |
The mart models take different analytical angles:
- Project 1:
account_360(full account view),pipeline_summary(stage/fiscal aggregation),lead_conversion_funnel(source/industry conversion rates) - Project 2:
fct_sales_pipeline(opportunity-level with deal tiering),dim_account_health(health scoring with Platinum/Gold/Silver/Bronze tiers),fct_lead_conversion(individual lead funnel tracking)
Project 2's models are more granular (row-per-entity), while Project 1's are more aggregated (summary metrics).
| Metric | Project 1 Session | Project 2 Session |
|---|---|---|
| Session ID | 66a81ae4 |
ee3d6fea |
| Total tokens | ~15.9M | ~5.0M (3.2x fewer) |
| Output tokens | 46,745 | 28,117 |
| Cache read tokens | 15.2M | 4.8M |
| Messages | 210 | 104 |
| Duration | 3 days (Mar 6-9, with revisits) | ~21 minutes (Mar 9) |
| Tool calls | ~140 | ~60 |
| Errors/retries | ~13 (import issues, pip not found, dlt API mismatches) | 3 (path resolution, env vars, zsh glob) |
| Skill used | dagster-demo → dagster-expert (orchestrator chain) |
dagster-expert directly |
| Project 1 | Project 2 | |
|---|---|---|
| Strategy | Orchestrator skill (dagster-demo) provided a 5-step workflow; heavy upfront reference reading (~10+ docs), then systematic execution |
Direct dagster-expert skill invocation; targeted reference reads, then dove into building |
| API exploration | Spent time introspecting dagster-dlt Python APIs at runtime (uv run python -c "from dagster_dlt import ...") to find correct imports |
Avoided built-in dlt component entirely — wrote custom component with raw dlt library |
| Component design | Tried to subclass existing components (DltLoadCollectionComponent) — hit import/API issues |
Built all custom components from scratch using dg.Component base class |
| Error recovery | Multiple cycles of dg check defs → fix imports → re-check |
Fewer errors; mostly env var / path issues, resolved quickly |
| Parallelism | More sequential tool calls | Aggressive parallel tool calls (scaffold 4 components simultaneously, read 5 files at once) |
Project 2 was 3.2x more token-efficient and completed in ~21 minutes vs. spanning 3 days. The main factors:
- Avoided complex subclassing — writing fully custom components with raw libraries (dlt, snowflake-connector, google APIs) was simpler than trying to extend Dagster's built-in integration components
- Skipped the orchestrator skill — going directly to
dagster-expertcut overhead - More parallel tool calls — scaffolding 4 components in one shot vs. sequentially
- Fewer errors — simpler component architecture meant fewer import/API compatibility issues
- No demo mode abstraction — less code surface to debug