Skip to content

Instantly share code, notes, and snippets.

@cnolanminich
Created November 25, 2025 15:47
Show Gist options
  • Select an option

  • Save cnolanminich/ec26f651ab84ae67319c109b948f2121 to your computer and use it in GitHub Desktop.

Select an option

Save cnolanminich/ec26f651ab84ae67319c109b948f2121 to your computer and use it in GitHub Desktop.
copilot_instructions_for_custom_demos

Dagster Demo Generator - GitHub Copilot Guide

This document provides comprehensive instructions for GitHub Copilot to replicate the Dagster demo generation workflow implemented in this repository.

Architecture Overview

The Dagster Demo Generator automates the creation of demonstration Dagster projects using a multi-tiered workflow:

  1. Command Entry Point: /create-dagster-demo slash command (.claude/commands/create-dagster-demo-2.md)
  2. Skills System: Three specialized skills that handle different aspects of Dagster project creation
  3. Validation Pipeline: Automated checks using dg check defs and dg list defs

Core Workflow

Step 1: Initialize Dagster Project

Skill: dagster-init (.claude/skills/dagster-init/SKILL.md)

Purpose: Create a new Dagster project with Components architecture using create-dagster CLI and uv package manager.

Key Commands:

# Check uv is installed
uv --version

# Create project (auto-confirm with printf)
printf "y\n" | uvx create-dagster@latest project <project-name>

# Navigate and sync dependencies
cd <project-name>
uv sync

# Validate setup
uv run dg check defs

Important Notes:

  • The printf "y\n" automatically answers the interactive prompt to avoid hanging
  • Creates a Components-compatible structure with definitions.py and definitions/defs/ directory
  • Must validate with dg check defs before proceeding

Step 2: Add Integration Components

Skills:

  • use-or-subclass-existing-component for existing Dagster integrations
  • create-custom-dagster-component when no integration exists

2a. Using Existing Integrations

When to use: Dagster has 70+ existing integrations (dbt, Fivetran, PowerBI, Looker, Sling, etc.)

Discovery:

# List available integrations
uv run dg docs integrations --json

# Or browse: https://docs.dagster.io/integrations/libraries

Critical Decision Tree:

  1. Is this a configuration-file-based component?

    • Examples: dbt, Sling
    • These read from local files (dbt_project.yml, replication.yaml)
    • DO: Use directly or subclass for custom behavior
    • DO NOT: Implement demo_mode (files are already local)
  2. Is this an API-based component?

    • Examples: Fivetran, PowerBI, Looker, Census, Airbyte
    • These call external APIs requiring credentials
    • DO: Implement demo_mode to mock API calls
    • DO: Provide dummy credentials in YAML (for schema validation)
    • The dummy credentials remain in YAML but are ignored when demo_mode=true

Implementation Pattern for API-Based Components:

# Example: CustomFivetranComponent
from dataclasses import dataclass
import dagster as dg
from dagster_fivetran import FivetranComponent

@dataclass
class CustomFivetranComponent(FivetranComponent):
    """Fivetran component with demo mode support."""

    # New field - automatically becomes YAML schema field via Resolvable
    demo_mode: bool = False

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        """Build definitions, using demo mode if enabled.

        Note: Parent class fields (API credentials) are still set from YAML,
        but when demo_mode is True, we bypass the parent's build_defs()
        and return mocked assets, so credentials are never used.
        """
        if self.demo_mode:
            return self._build_demo_defs(context)
        else:
            # Use real integration with actual credentials
            return super().build_defs(context)

    def _build_demo_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        """Build demo mode definitions with mocked assets."""
        @dg.asset(
            key=dg.AssetKey(["fivetran", "demo_sync"]),
            kinds={"fivetran"},  # CRITICAL: Always add kinds
            description="Demo Fivetran sync of data",
        )
        def fivetran_demo_sync(context: dg.AssetExecutionContext):
            context.log.info("Demo mode: Simulating Fivetran sync")
            return {"status": "demo_success", "records_synced": 1000}

        return dg.Definitions(assets=[fivetran_demo_sync])

Component YAML with Dummy Credentials:

# defs/fivetran_sync/defs.yaml
type: my_project.defs.fivetran_sync.component.CustomFivetranComponent

attributes:
  demo_mode: true

  # Dummy credentials - REQUIRED for schema validation but ignored in demo mode
  # MUST be present and uncommented or YAML will fail to load
  fivetran_resource:
    api_key: "demo_api_key"
    api_secret: "demo_api_secret"
  connector_id: "demo_connector_id"

Key Points:

  • Use @dataclass decorator on subclass (most Dagster components use dataclass, not Pydantic)
  • Add fields with type annotations - they automatically become YAML schema fields
  • Always call super().build_defs(context) when NOT in demo mode
  • Dummy credentials MUST be present and uncommented in YAML for schema validation
  • To switch to production: Set demo_mode: false and replace dummy values with real credentials

Configuration-File-Based Pattern (dbt example):

# defs/analytics_dbt/defs.yaml
type: dagster_dbt.DbtProjectComponent

attributes:
  # No demo_mode needed - dbt reads local project files
  project:
    project_dir: analytics_dbt  # Path to local dbt project

2b. Creating Custom Components

When to use: When Dagster doesn't have an integration for the technology

CRITICAL: For custom components from scratch, you have two options:

Option 1: Manual Creation (Recommended for simple components)

Create the component directory manually and use @dataclass with Resolvable:

from dataclasses import dataclass, field
from typing import Any
import dagster as dg

@dataclass
class MyCustomComponent(dg.Component, dg.Resolvable):
    """CRITICAL: Must inherit from BOTH Component AND Resolvable when creating from scratch."""
    
    demo_mode: bool = False
    api_key: str = ""
    other_param: str = "default_value"
    list_param: list[str] = field(default_factory=list)  # Use field() for mutable defaults

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        """Build definitions with demo mode support."""
        if self.demo_mode:
            return self._build_demo_defs(context)
        else:
            return self._build_real_defs(context)
    
    def _build_demo_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        """Demo implementation."""
        
        @dg.asset(
            kinds={"technology_name"},  # REQUIRED: Add integration kind
            description="Raw data ingestion",
        )
        def raw_data_ingest(context: dg.AssetExecutionContext):
            # Demo implementation - local/mocked data
            context.log.info("Running in demo mode with local data")
            return {"data": "mocked_data"}

        @dg.asset(
            deps=[raw_data_ingest],
            kinds={"technology_name"},  # REQUIRED
            description="Data transformation",
        )
        def data_transformation(context: dg.AssetExecutionContext):
            context.log.info("Processing demo data")
            return {"transformed": "demo_result"}

        # Create 3-5 assets total with proper dependencies
        return dg.Definitions(assets=[raw_data_ingest, data_transformation])
    
    def _build_real_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        """Real implementation."""
        # Real connection logic here
        # Use self.api_key, self.other_param, etc.
        pass

Why Resolvable is Required:

  • When you provide attributes in YAML, Dagster needs to know how to deserialize them into your component
  • Resolvable interface tells Dagster: "This class can be instantiated from YAML attributes"
  • Without it, you'll get: Component is not resolvable from YAML, but attributes were provided
  • When subclassing existing integrations (like FivetranComponent), they already inherit Resolvable, so you don't need to add it

Option 2: Using dg scaffold component (If needed)

# Create component (this generates boilerplate - may be overly complex)
uv run dg scaffold component <ComponentName>

# Instantiate component
uv run dg scaffold defs my_module.components.<ComponentName> my_component

Note: For most API-based components with demo mode, Option 1 (manual creation with @dataclass and Resolvable) is simpler and clearer


**Asset Design Guidelines**:
- Create 3-5 realistic assets based on chosen technologies
- Establish proper dependencies between assets
- Use descriptive names reflecting business purpose
- Always add `kinds` parameter to categorize by technology
- Implement both real and demo mode logic paths

**Example Asset Flow**:
1. Raw data ingestion asset
2. Data transformation/cleaning asset
3. Business logic/aggregation asset
4. ML model or analytics asset (if applicable)
5. Output/export asset

### Step 3: Validate Setup

**Critical Validation Commands**:

```bash
# Check definitions load without errors
uv run dg check defs

# List all assets to verify creation and dependencies
uv run dg list defs

What to verify:

  • ✅ All expected assets are listed
  • ✅ Asset dependencies are correctly connected
  • ✅ No errors or warnings
  • kinds column shows integration types
  • ✅ Demo mode works without external connections
  • ✅ Non-demo mode has realistic database/API connections

Special attention for dbt and Sling:

  • Verify asset lineage matches expected flow
  • For dbt: Check staging → intermediate → mart asset dependencies
  • For Sling: Check source → destination asset mapping

Integration-Specific Patterns

dbt (Configuration-File-Based)

type: dagster_dbt.DbtProjectComponent

attributes:
  project:
    project_dir: my_dbt_project

Key Points:

  • NO demo_mode needed (reads local files)
  • Ensure profiles.yml is configured properly
  • Use DuckDB for local dev, Snowflake/BigQuery for prod

Sling (Configuration-File-Based)

type: dagster_sling.SlingReplicationCollectionComponent

attributes:
  replications:
    - path: replication.yaml

Key Points:

  • NO demo_mode needed (reads local config)
  • Replication YAML defines source → destination mappings
  • Sling handles data movement automatically

Fivetran (API-Based)

@dataclass
class CustomFivetranComponent(FivetranComponent):
    demo_mode: bool = False

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        if self.demo_mode:
            return self._build_demo_defs(context)
        return super().build_defs(context)
attributes:
  demo_mode: true
  fivetran_resource:
    api_key: "demo_api_key"  # Dummy value required
    api_secret: "demo_api_secret"
  connector_id: "demo_connector_id"

PowerBI (API-Based)

@dataclass
class CustomPowerBIComponent(PowerBIWorkspaceComponent):
    demo_mode: bool = False

    def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
        if self.demo_mode:
            return self._build_demo_defs(context)
        return super().build_defs(context)
attributes:
  demo_mode: true
  powerbi_resource:
    client_id: "demo_client_id"  # Dummy values required
    client_secret: "demo_client_secret"
    tenant_id: "demo_tenant_id"
  workspace_id: "demo_workspace_id"

Common Pitfalls and Solutions

Problem: Schema Validation Errors

Symptom: YAML fails to load with missing required field errors

Cause: Parent component requires fields, but they weren't provided

Solution: Provide dummy values for ALL required fields, keep them uncommented:

attributes:
  demo_mode: true
  # These MUST be present even in demo mode
  api_resource:
    api_key: "demo_key"
    api_secret: "demo_secret"

Problem: Component Not Found

Symptom: dg check defs fails with import error

Cause: Fully qualified type path is incorrect in YAML

Solution: Verify path matches Python module structure:

# Correct format
type: my_project.defs.component_instance.component.CustomComponentClass

Problem: Assets Not Showing Expected Dependencies

Symptom: dg list defs shows assets but dependencies are wrong or missing

Common Causes and Solutions:

1. Custom Component Asset Dependencies

  • Issue: Hightouch/Census/custom component assets not downstream of dbt
  • Cause: Using wrong asset key format in deps parameter
  • Solution: Match the exact asset key shown in dg list defs
    # ❌ WRONG - includes component prefix
    deps=[dg.AssetKey(["analytics_dbt", "customer_metrics"])]
    
    # ✅ CORRECT - use actual asset key
    deps=[dg.AssetKey(["customer_metrics"])]

2. dbt Source References Not Matching Sling Targets

  • Issue: dbt staging models don't depend on Sling ingestion assets
  • Cause: dbt source() schema doesn't match Sling's target schema
  • Solution:
    • Check Sling replication.yaml to see what schema/tables it creates
    • Sling with target: DUCKDB creates tables in schema based on target name
    • Update dbt models/sources.yml to match:
      # Match Sling's actual output schema
      sources:
        - name: target  # ← Must match Sling's target schema
          schema: target
          tables:
            - name: raw_customers  # ← Must match Sling's object names
    • Update dbt models to use correct source:
      -- ✅ CORRECT
      FROM {{ source('target', 'raw_customers') }}
      
      -- ❌ WRONG if Sling uses different schema
      FROM {{ source('raw', 'raw_customers') }}

3. Verifying Asset Keys

  • Run uv run dg list defs and check the "Key" column
  • Use EXACTLY those keys in your deps parameters
  • Asset keys from dbt are usually just the model name (e.g., customer_metrics not analytics_dbt/customer_metrics)
  • Asset keys from Sling follow pattern: {target_schema}/{object_name}

4. Other Dependency Issues

  • For custom components: Use deps=[AssetKey(...)] in @asset decorator
  • For dbt: Verify dbt model ref() and source() calls are correct in SQL
  • For Sling: Check replication.yaml source→destination mappings

Problem: Missing kinds Metadata

Symptom: Assets show up but "Kinds" column is empty in dg list defs

Solution: Always add kinds parameter to asset decorator:

@dg.asset(
    key=dg.AssetKey(["my_asset"]),
    kinds={"technology_name"},  # REQUIRED
)

Testing Checklist

Before considering a demo complete:

  • Run uv run dg check defs - passes without errors
  • Run uv run dg list defs - shows all expected assets
  • Verify asset dependencies in output
  • Check kinds column shows integration types
  • Test with demo_mode: true - works without external connections
  • Verify with demo_mode: false - has realistic implementation (even if credentials are dummy)
  • Confirm dummy credentials remain uncommented in YAML
  • Documentation explains how to switch between demo/production

GitHub Copilot Instructions

When GitHub Copilot is asked to create a Dagster demo:

  1. Start with project initialization:

    • Use printf "y\n" | uvx create-dagster@latest project <name>
    • Navigate to project directory and run uv sync
  2. Determine component strategy:

    • For existing integrations: Use use-or-subclass-existing-component pattern
    • For custom integrations: Use create-custom-dagster-component pattern
  3. Implement demo_mode correctly:

    • Configuration-file-based (dbt, Sling): NO demo_mode
    • API-based (Fivetran, PowerBI, Looker): YES demo_mode with dummy credentials
  4. Create realistic assets:

    • 3-5 assets with proper dependencies
    • Always include kinds parameter
    • Implement both demo and real logic paths
  5. Validate thoroughly:

    • Run dg check defs
    • Run dg list defs
    • Verify asset lineage
    • Test demo mode works locally
  6. Document the demo:

    • Explain the asset flow
    • Show how to toggle demo_mode
    • Provide instructions for production deployment

Resources

Example Complete Workflow

# 1. Create project
printf "y\n" | uvx create-dagster@latest project my-demo
cd my-demo
uv sync

# 2. Add integration packages
uv add dagster-fivetran dagster-dbt

# 3. Create Fivetran component with demo mode
uv run dg scaffold defs dagster_fivetran.FivetranComponent fivetran_sync
# Edit component.py to add demo_mode subclass
# Edit defs.yaml to set demo_mode: true and add dummy credentials

# 4. Add dbt component (no demo mode needed)
uv run dg scaffold defs dagster_dbt.DbtProjectComponent analytics_dbt
# Edit defs.yaml to point to dbt project directory

# 5. Validate
uv run dg check defs
uv run dg list defs

# 6. Test demo mode
uv run dg dev
# Open http://localhost:3000
# Materialize assets to test demo mode execution

This workflow creates a complete demo with both API-based (Fivetran with demo_mode) and configuration-based (dbt without demo_mode) components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment