This document provides comprehensive instructions for GitHub Copilot to replicate the Dagster demo generation workflow implemented in this repository.
The Dagster Demo Generator automates the creation of demonstration Dagster projects using a multi-tiered workflow:
- Command Entry Point:
/create-dagster-demoslash command (.claude/commands/create-dagster-demo-2.md) - Skills System: Three specialized skills that handle different aspects of Dagster project creation
- Validation Pipeline: Automated checks using
dg check defsanddg list defs
Skill: dagster-init (.claude/skills/dagster-init/SKILL.md)
Purpose: Create a new Dagster project with Components architecture using create-dagster CLI and uv package manager.
Key Commands:
# Check uv is installed
uv --version
# Create project (auto-confirm with printf)
printf "y\n" | uvx create-dagster@latest project <project-name>
# Navigate and sync dependencies
cd <project-name>
uv sync
# Validate setup
uv run dg check defsImportant Notes:
- The
printf "y\n"automatically answers the interactive prompt to avoid hanging - Creates a Components-compatible structure with
definitions.pyanddefinitions/defs/directory - Must validate with
dg check defsbefore proceeding
Skills:
use-or-subclass-existing-componentfor existing Dagster integrationscreate-custom-dagster-componentwhen no integration exists
When to use: Dagster has 70+ existing integrations (dbt, Fivetran, PowerBI, Looker, Sling, etc.)
Discovery:
# List available integrations
uv run dg docs integrations --json
# Or browse: https://docs.dagster.io/integrations/librariesCritical Decision Tree:
-
Is this a configuration-file-based component?
- Examples: dbt, Sling
- These read from local files (dbt_project.yml, replication.yaml)
- DO: Use directly or subclass for custom behavior
- DO NOT: Implement demo_mode (files are already local)
-
Is this an API-based component?
- Examples: Fivetran, PowerBI, Looker, Census, Airbyte
- These call external APIs requiring credentials
- DO: Implement demo_mode to mock API calls
- DO: Provide dummy credentials in YAML (for schema validation)
- The dummy credentials remain in YAML but are ignored when demo_mode=true
Implementation Pattern for API-Based Components:
# Example: CustomFivetranComponent
from dataclasses import dataclass
import dagster as dg
from dagster_fivetran import FivetranComponent
@dataclass
class CustomFivetranComponent(FivetranComponent):
"""Fivetran component with demo mode support."""
# New field - automatically becomes YAML schema field via Resolvable
demo_mode: bool = False
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
"""Build definitions, using demo mode if enabled.
Note: Parent class fields (API credentials) are still set from YAML,
but when demo_mode is True, we bypass the parent's build_defs()
and return mocked assets, so credentials are never used.
"""
if self.demo_mode:
return self._build_demo_defs(context)
else:
# Use real integration with actual credentials
return super().build_defs(context)
def _build_demo_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
"""Build demo mode definitions with mocked assets."""
@dg.asset(
key=dg.AssetKey(["fivetran", "demo_sync"]),
kinds={"fivetran"}, # CRITICAL: Always add kinds
description="Demo Fivetran sync of data",
)
def fivetran_demo_sync(context: dg.AssetExecutionContext):
context.log.info("Demo mode: Simulating Fivetran sync")
return {"status": "demo_success", "records_synced": 1000}
return dg.Definitions(assets=[fivetran_demo_sync])Component YAML with Dummy Credentials:
# defs/fivetran_sync/defs.yaml
type: my_project.defs.fivetran_sync.component.CustomFivetranComponent
attributes:
demo_mode: true
# Dummy credentials - REQUIRED for schema validation but ignored in demo mode
# MUST be present and uncommented or YAML will fail to load
fivetran_resource:
api_key: "demo_api_key"
api_secret: "demo_api_secret"
connector_id: "demo_connector_id"Key Points:
- Use
@dataclassdecorator on subclass (most Dagster components use dataclass, not Pydantic) - Add fields with type annotations - they automatically become YAML schema fields
- Always call
super().build_defs(context)when NOT in demo mode - Dummy credentials MUST be present and uncommented in YAML for schema validation
- To switch to production: Set
demo_mode: falseand replace dummy values with real credentials
Configuration-File-Based Pattern (dbt example):
# defs/analytics_dbt/defs.yaml
type: dagster_dbt.DbtProjectComponent
attributes:
# No demo_mode needed - dbt reads local project files
project:
project_dir: analytics_dbt # Path to local dbt projectWhen to use: When Dagster doesn't have an integration for the technology
CRITICAL: For custom components from scratch, you have two options:
Option 1: Manual Creation (Recommended for simple components)
Create the component directory manually and use @dataclass with Resolvable:
from dataclasses import dataclass, field
from typing import Any
import dagster as dg
@dataclass
class MyCustomComponent(dg.Component, dg.Resolvable):
"""CRITICAL: Must inherit from BOTH Component AND Resolvable when creating from scratch."""
demo_mode: bool = False
api_key: str = ""
other_param: str = "default_value"
list_param: list[str] = field(default_factory=list) # Use field() for mutable defaults
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
"""Build definitions with demo mode support."""
if self.demo_mode:
return self._build_demo_defs(context)
else:
return self._build_real_defs(context)
def _build_demo_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
"""Demo implementation."""
@dg.asset(
kinds={"technology_name"}, # REQUIRED: Add integration kind
description="Raw data ingestion",
)
def raw_data_ingest(context: dg.AssetExecutionContext):
# Demo implementation - local/mocked data
context.log.info("Running in demo mode with local data")
return {"data": "mocked_data"}
@dg.asset(
deps=[raw_data_ingest],
kinds={"technology_name"}, # REQUIRED
description="Data transformation",
)
def data_transformation(context: dg.AssetExecutionContext):
context.log.info("Processing demo data")
return {"transformed": "demo_result"}
# Create 3-5 assets total with proper dependencies
return dg.Definitions(assets=[raw_data_ingest, data_transformation])
def _build_real_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
"""Real implementation."""
# Real connection logic here
# Use self.api_key, self.other_param, etc.
passWhy Resolvable is Required:
- When you provide
attributesin YAML, Dagster needs to know how to deserialize them into your component Resolvableinterface tells Dagster: "This class can be instantiated from YAML attributes"- Without it, you'll get:
Component is not resolvable from YAML, but attributes were provided - When subclassing existing integrations (like
FivetranComponent), they already inheritResolvable, so you don't need to add it
Option 2: Using dg scaffold component (If needed)
# Create component (this generates boilerplate - may be overly complex)
uv run dg scaffold component <ComponentName>
# Instantiate component
uv run dg scaffold defs my_module.components.<ComponentName> my_componentNote: For most API-based components with demo mode, Option 1 (manual creation with @dataclass and Resolvable) is simpler and clearer
**Asset Design Guidelines**:
- Create 3-5 realistic assets based on chosen technologies
- Establish proper dependencies between assets
- Use descriptive names reflecting business purpose
- Always add `kinds` parameter to categorize by technology
- Implement both real and demo mode logic paths
**Example Asset Flow**:
1. Raw data ingestion asset
2. Data transformation/cleaning asset
3. Business logic/aggregation asset
4. ML model or analytics asset (if applicable)
5. Output/export asset
### Step 3: Validate Setup
**Critical Validation Commands**:
```bash
# Check definitions load without errors
uv run dg check defs
# List all assets to verify creation and dependencies
uv run dg list defs
What to verify:
- ✅ All expected assets are listed
- ✅ Asset dependencies are correctly connected
- ✅ No errors or warnings
- ✅
kindscolumn shows integration types - ✅ Demo mode works without external connections
- ✅ Non-demo mode has realistic database/API connections
Special attention for dbt and Sling:
- Verify asset lineage matches expected flow
- For dbt: Check staging → intermediate → mart asset dependencies
- For Sling: Check source → destination asset mapping
type: dagster_dbt.DbtProjectComponent
attributes:
project:
project_dir: my_dbt_projectKey Points:
- NO demo_mode needed (reads local files)
- Ensure
profiles.ymlis configured properly - Use DuckDB for local dev, Snowflake/BigQuery for prod
type: dagster_sling.SlingReplicationCollectionComponent
attributes:
replications:
- path: replication.yamlKey Points:
- NO demo_mode needed (reads local config)
- Replication YAML defines source → destination mappings
- Sling handles data movement automatically
@dataclass
class CustomFivetranComponent(FivetranComponent):
demo_mode: bool = False
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
if self.demo_mode:
return self._build_demo_defs(context)
return super().build_defs(context)attributes:
demo_mode: true
fivetran_resource:
api_key: "demo_api_key" # Dummy value required
api_secret: "demo_api_secret"
connector_id: "demo_connector_id"@dataclass
class CustomPowerBIComponent(PowerBIWorkspaceComponent):
demo_mode: bool = False
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
if self.demo_mode:
return self._build_demo_defs(context)
return super().build_defs(context)attributes:
demo_mode: true
powerbi_resource:
client_id: "demo_client_id" # Dummy values required
client_secret: "demo_client_secret"
tenant_id: "demo_tenant_id"
workspace_id: "demo_workspace_id"Symptom: YAML fails to load with missing required field errors
Cause: Parent component requires fields, but they weren't provided
Solution: Provide dummy values for ALL required fields, keep them uncommented:
attributes:
demo_mode: true
# These MUST be present even in demo mode
api_resource:
api_key: "demo_key"
api_secret: "demo_secret"Symptom: dg check defs fails with import error
Cause: Fully qualified type path is incorrect in YAML
Solution: Verify path matches Python module structure:
# Correct format
type: my_project.defs.component_instance.component.CustomComponentClassSymptom: dg list defs shows assets but dependencies are wrong or missing
Common Causes and Solutions:
1. Custom Component Asset Dependencies
- Issue: Hightouch/Census/custom component assets not downstream of dbt
- Cause: Using wrong asset key format in
depsparameter - Solution: Match the exact asset key shown in
dg list defs# ❌ WRONG - includes component prefix deps=[dg.AssetKey(["analytics_dbt", "customer_metrics"])] # ✅ CORRECT - use actual asset key deps=[dg.AssetKey(["customer_metrics"])]
2. dbt Source References Not Matching Sling Targets
- Issue: dbt staging models don't depend on Sling ingestion assets
- Cause: dbt
source()schema doesn't match Sling's target schema - Solution:
- Check Sling replication.yaml to see what schema/tables it creates
- Sling with
target: DUCKDBcreates tables in schema based on target name - Update dbt
models/sources.ymlto match:# Match Sling's actual output schema sources: - name: target # ← Must match Sling's target schema schema: target tables: - name: raw_customers # ← Must match Sling's object names
- Update dbt models to use correct source:
-- ✅ CORRECT FROM {{ source('target', 'raw_customers') }} -- ❌ WRONG if Sling uses different schema FROM {{ source('raw', 'raw_customers') }}
3. Verifying Asset Keys
- Run
uv run dg list defsand check the "Key" column - Use EXACTLY those keys in your
depsparameters - Asset keys from dbt are usually just the model name (e.g.,
customer_metricsnotanalytics_dbt/customer_metrics) - Asset keys from Sling follow pattern:
{target_schema}/{object_name}
4. Other Dependency Issues
- For custom components: Use
deps=[AssetKey(...)]in@assetdecorator - For dbt: Verify dbt model
ref()andsource()calls are correct in SQL - For Sling: Check
replication.yamlsource→destination mappings
Symptom: Assets show up but "Kinds" column is empty in dg list defs
Solution: Always add kinds parameter to asset decorator:
@dg.asset(
key=dg.AssetKey(["my_asset"]),
kinds={"technology_name"}, # REQUIRED
)Before considering a demo complete:
- Run
uv run dg check defs- passes without errors - Run
uv run dg list defs- shows all expected assets - Verify asset dependencies in output
- Check
kindscolumn shows integration types - Test with
demo_mode: true- works without external connections - Verify with
demo_mode: false- has realistic implementation (even if credentials are dummy) - Confirm dummy credentials remain uncommented in YAML
- Documentation explains how to switch between demo/production
When GitHub Copilot is asked to create a Dagster demo:
-
Start with project initialization:
- Use
printf "y\n" | uvx create-dagster@latest project <name> - Navigate to project directory and run
uv sync
- Use
-
Determine component strategy:
- For existing integrations: Use
use-or-subclass-existing-componentpattern - For custom integrations: Use
create-custom-dagster-componentpattern
- For existing integrations: Use
-
Implement demo_mode correctly:
- Configuration-file-based (dbt, Sling): NO demo_mode
- API-based (Fivetran, PowerBI, Looker): YES demo_mode with dummy credentials
-
Create realistic assets:
- 3-5 assets with proper dependencies
- Always include
kindsparameter - Implement both demo and real logic paths
-
Validate thoroughly:
- Run
dg check defs - Run
dg list defs - Verify asset lineage
- Test demo mode works locally
- Run
-
Document the demo:
- Explain the asset flow
- Show how to toggle demo_mode
- Provide instructions for production deployment
- Dagster Documentation
- Components Guide
- Subclassing Components
- Integration Libraries
- uv Documentation
- Full API Reference
- Concise API Reference
# 1. Create project
printf "y\n" | uvx create-dagster@latest project my-demo
cd my-demo
uv sync
# 2. Add integration packages
uv add dagster-fivetran dagster-dbt
# 3. Create Fivetran component with demo mode
uv run dg scaffold defs dagster_fivetran.FivetranComponent fivetran_sync
# Edit component.py to add demo_mode subclass
# Edit defs.yaml to set demo_mode: true and add dummy credentials
# 4. Add dbt component (no demo mode needed)
uv run dg scaffold defs dagster_dbt.DbtProjectComponent analytics_dbt
# Edit defs.yaml to point to dbt project directory
# 5. Validate
uv run dg check defs
uv run dg list defs
# 6. Test demo mode
uv run dg dev
# Open http://localhost:3000
# Materialize assets to test demo mode executionThis workflow creates a complete demo with both API-based (Fivetran with demo_mode) and configuration-based (dbt without demo_mode) components.