Analysis Date: March 2, 2026
Between Dagster 1.6 and the current release (1.12.17, as of February 27, 2026), the documentation has undergone a fundamental transformation — not just a cosmetic refresh, but a philosophical and structural overhaul that reflects Dagster's evolution from a flexible orchestration framework into a highly opinionated data platform with prescribed workflows, new abstractions (Components, dg CLI), and a dramatically narrower "happy path" for new users.
The key shifts:
| Dimension | Dagster 1.6 Docs | Current Docs |
|---|---|---|
| Core abstraction | Software-Defined Assets (@asset) |
Assets (@dg.asset) + Components |
| Project creation | dagster project scaffold |
create-dagster project |
| Development CLI | dagster dev |
dg dev |
| Configuration | Pure Python (Definitions(...)) |
YAML (defs.yaml) + auto-discovery |
| Project structure | Recommended but flexible | Prescribed and auto-generated |
| Onboarding philosophy | "Start with assets, explore from there" | "Follow this exact path" |
| Automation model | Schedules + Sensors + Auto-Materialize | Schedules + Sensors + Declarative Automation (GA) |
| Ops/Graphs positioning | "Advanced topic, not needed to start" | Virtually absent from onboarding |
| Package management | pip install |
uv (recommended) or pip |
| Overall opinionation | Moderate-high (asset-first but options shown) | Very high in onboarding (single prescribed path); moderate in mini-examples (trade-offs shown) |
Dagster 1.6 organized docs into these top-level sections:
- Getting Started (Hello Dagster, Create New Project)
- Tutorial (7-part asset-focused walkthrough)
- Concepts (Assets, Ops/Jobs/Graphs, Resources, Scheduling, etc.)
- Guides (including best practices embedded within)
- Integrations
- Deployment
- API Reference
The architecture was a classic hub-and-spoke model: a Getting Started hub let users self-select their path with four call-to-action buttons ("Run Hello, Dagster!", "View Tutorials", "Explore Concepts", "Enroll in Dagster University"). Users had agency in choosing their learning journey.
Current docs reorganize around:
- Getting Started (Installation, Quickstart, Create New Project)
- Dagster Basics Tutorial (assets, dependencies, projects, schedules, components)
- ETL Pipeline Tutorial (Components-driven real-world pipeline)
- Concepts (Assets, Automation, etc.)
- Guides (Build, Automate, Deploy, Components)
- API Reference (including
dgCLI andcreate-dagsterCLI references)
The architecture is now a linear funnel: Install → Quickstart → Basics Tutorial → ETL Tutorial → Concepts/Guides. Users are guided through a prescribed sequence rather than offered a menu of options.
Several major abstractions were added between 1.6 and now:
-
Components (GA as of 1.11.10) — Higher-level building blocks that generate assets and other definitions from YAML configuration. Built-in components exist for dbt, Sling, Fivetran, Airbyte, dlt, Power BI, Looker, and more. Components are now the headline abstraction in the ETL tutorial and are positioned as the recommended way to build integration-heavy pipelines.
-
dgCLI (GA as of 1.11.10) — A new CLI that replaces/supplements thedagsterCLI for development workflows. Commands includedg dev,dg scaffold defs,dg check defs,dg launch,dg scaffold build-artifacts, anddg list component. -
create-dagsterCLI — Replacesdagster project scaffoldas the primary project scaffolding tool. Generates a modern project structure withpyproject.toml,src/,defs/, and auto-discovery viadefinitions.py. -
Declarative Automation (GA as of 1.9) — Replaces Auto-Materialize Policies with
AutomationConditionprimitives (on_cron,eager,on_missing). Conditions can be combined and customized. The@multi_asset_sensoris now deprecated in favor of this approach. -
defs.yaml— YAML configuration files (with Jinja2 templating) that configure component instances and set directory-level attributes (owner, tags, etc.). This represents a shift from "everything in Python" to "configuration in YAML, logic in Python." -
Auto-discovery via
definitions.py— The new project structure features adefinitions.pyentry point that automatically discovers all definitions in thedefs/directory. The docs explicitly state: "You should not need to modify this file."
-
Ops and Graphs: In 1.6, the ops page stated: "An op is the core unit of computation in Dagster." In the current docs, the concepts page states: "Ops have largely been replaced by assets." Every ops guide page now carries a banner: "If you are just getting started with Dagster, we strongly recommend you use assets rather than ops to build your data pipelines."
-
I/O Managers: The 1.6 tutorial had a dedicated section (Part 6) on "Saving data with I/O managers." Current onboarding materials do not emphasize I/O managers.
-
setup.py: Legacy project scaffolding usedsetup.py. Current projects usepyproject.tomlexclusively. -
Auto-Materialize Policies: Deprecated in favor of Declarative Automation. The
AutoMaterializePolicyandAutoMaterializeRuleinterfaces are marked as deprecated. -
SourceAsset: While still supported, the concept is less prominent in current docs. External assets andAssetSpec(declarative description without materialization logic) have taken its place.
| Action | Dagster 1.6 | Current |
|---|---|---|
| Create project | dagster project scaffold --name my-project |
uvx create-dagster@latest project my-project |
| Start from example | dagster project from-example --example quickstart_etl |
N/A (tutorials are self-contained) |
| Scaffold definitions | Manual Python file creation | dg scaffold defs dagster.asset <path> |
| Scaffold components | N/A | dg scaffold defs dagster_dbt.DbtProjectComponent <path> |
| Local dev server | dagster dev |
dg dev |
| Validate definitions | N/A (runtime errors) | dg check defs |
| Run pipeline | UI or dagster job launch |
dg launch --assets or UI |
| Build for deploy | Manual Dockerfile | dg scaffold build-artifacts |
| Install packages | pip install dagster dagster-webserver |
uv add dagster (recommended) |
Dagster 1.6 docs were moderately to highly opinionated — they clearly favored the asset-first paradigm, but they acknowledged alternatives and gave users some freedom to explore.
Key evidence:
-
Hello Dagster (1.6) was built entirely around
@asset, but explicitly stated at the bottom: "Dagster also offers ops and jobs, but we recommend starting with assets." This is a clear recommendation paired with an acknowledgment that alternatives exist. -
The Tutorial (1.6) was titled "Building a pipeline using Software-defined Assets" and was 100% asset-focused across all 7 parts. However, the concepts section still devoted full pages to ops, jobs, and graphs as legitimate patterns.
-
The SDA Concepts page (1.6) contained the key framing statement: "Behind-the-scenes, the Python function is an op. Ops are an advanced topic that isn't required to get started with Dagster." This positions ops as real but not necessary — a nuanced middle ground.
-
The Ops page (1.6) still defined ops as "the core unit of computation" and provided extensive documentation on their use. Ops were not hidden — they were fully documented but clearly labeled as a lower-level abstraction.
-
The "How Assets Relate to Ops and Graphs" guide (1.6) provided explicit guidance on when ops are appropriate: (1) you're not building a data pipeline, (2) you want to break an asset into multiple steps (graph-backed assets), (3) you're anchored in task-based workflows (migration/legacy path).
-
The Getting Started page (1.6) was a hub with four buttons letting users self-select. Users could go to "Hello Dagster," "Tutorials," "Concepts," or "Dagster University" — implying that multiple learning paths were valid.
Verbatim examples of 1.6's "recommend but allow" tone:
"An asset is the easiest way to get started with Dagster, and can be used to model assets that should be materialized by Dagster." —
/concepts/assets/software-defined-assets
"You can group assets using
load_assets_from_package_module(recommended), or by using thegroup_nameargument on each individual asset." —/concepts/assets/software-defined-assets(note the "(recommended)" alongside the alternative)
"Dagster also offers ops and jobs, but we recommend starting with assets." —
/getting-started/hello-dagster
"Behind-the-scenes, the Python function is an op and the asset is modeled on top of it. Ops are an advanced topic that isn't required to get started with Dagster." —
/concepts/assets/software-defined-assets
"When you define an asset, the same function also produces the op that computes it... In this case, there's no reason to split the logic up into multiple steps. But sometimes you may want to... in this case you may use a graph-backed asset." —
/guides/dagster/how-assets-relate-to-ops-and-graphs
Summary (1.6): The docs said "here's the best way, and here's why, but we understand if you need other approaches." The opinion was strong but not absolute.
Current docs are very to extremely opinionated — they present a single prescribed workflow and rarely acknowledge alternatives in onboarding materials.
Key evidence:
-
The Quickstart is a step-by-step command sequence with zero branching:
create-dagster→dg scaffold defs→dg check defs→dg dev. There is exactly one path. The only optionality isuvvspipfor package management (and even there,uvis marked "recommended"). -
The Dagster Basics Tutorial follows a single prescribed path from project creation through assets, dependencies, resources, checks, automation, and components. There is no "you could also do it this way" branching.
-
The ETL Pipeline Tutorial is built entirely around Components (
DbtProjectComponent,SlingReplicationCollectionComponent). There is no mention of building the same pipeline without Components. DuckDB, dbt, and Sling are prescribed technology choices. -
Components are positioned as THE way to build integration-heavy pipelines: "Dagster comes with ready-made Components for common integrations... These plug-and-play Components let you spin up new pipelines in minutes by filling out a few config fields, rather than hand-coding new assets from scratch."
-
The
definitions.pyfile is explicitly hands-off: "You should not need to modify this file." This removes a decision point — the user doesn't choose how definitions are organized, the framework handles it. -
Ops and graphs are virtually invisible in the onboarding flow. You would have to navigate to deep reference pages to find them.
-
Declarative Automation is clearly favored over other automation methods. The
@multi_asset_sensoris deprecated. While schedules and sensors are still documented, the framing suggests Declarative Automation is the forward-looking choice.
Verbatim examples of the current docs' prescriptive tone:
"If you are just getting started with Dagster, we strongly recommend you use assets rather than ops to build your data pipelines." — Banner on every ops-related guide page (
/guides/build/ops,/guides/build/ops/graphs, etc.)
"Ops have largely been replaced by assets." —
/getting-started/concepts
"For situations where you are automating execution of assets only, Dagster recommends using Declarative Automation instead." —
/guides/automate/asset-sensors
"The Components framework and the
dgCLI are now marked as GA... The APIs are fully supported throughout all parts of the product and remain the recommended defaults for new Dagster projects." —/about/changelog
"Unlike the assets file, which was in Python, components provide a low-code interface in YAML." —
/etl-pipeline-tutorial/transform
"As you develop your Dagster project, it is a good habit to run
dg checkto ensure everything works as expected." —/tutorial/assets
| Topic | 1.6 Level | Current Level | Change |
|---|---|---|---|
| Asset-first vs ops/tasks | Strong recommendation | Absolute (ops invisible in onboarding) | ↑↑ |
| Project scaffolding | One recommended CLI | One prescribed CLI + structure | ↑ |
| Project structure | Recommended layout guide | Auto-generated, non-negotiable skeleton | ↑↑ |
| Configuration approach | Pure Python | YAML (defs.yaml) prescribed for components |
↑↑ (new) |
| Automation strategy | Multiple options, no clear winner | Declarative Automation clearly favored | ↑ |
| Integration patterns | Manual Python code | Components + dg scaffold prescribed |
↑↑ (new) |
| Package manager | pip (no opinion) |
uv recommended, pip as fallback |
↑ (new) |
| Learning path | Self-selected hub | Linear funnel | ↑↑ |
| CLI tooling | dagster CLI |
dg CLI (replaces dagster for dev workflows) |
↑ (new) |
Entry point: The Getting Started page was a hub with four paths. A new user would likely click "Run Hello, Dagster!"
Hello Dagster (1.6):
- Create a single Python file with two
@asset-decorated functions pip install dagster dagster-webserver pandasdagster dev -f hello-dagster.py- Click "Materialize All" in the UI
This was 40 lines of code, zero project structure, immediate feedback. The user went from nothing to a working asset graph in under 5 minutes. The framing was: "Look, this is what an asset is, and it already does useful things."
The exact Hello Dagster file (1.6) — the first code a new user saw:
import json
import pandas as pd
import requests
from dagster import AssetExecutionContext, MetadataValue, asset
@asset
def hackernews_top_story_ids():
"""Get top stories from the HackerNews top stories endpoint."""
top_story_ids = requests.get(
"https://hacker-news.firebaseio.com/v0/topstories.json"
).json()
with open("hackernews_top_story_ids.json", "w") as f:
json.dump(top_story_ids[:10], f)
@asset(deps=[hackernews_top_story_ids])
def hackernews_top_stories(context: AssetExecutionContext):
"""Get items based on story ids from the HackerNews items endpoint."""
with open("hackernews_top_story_ids.json", "r") as f:
hackernews_top_story_ids = json.load(f)
results = []
for item_id in hackernews_top_story_ids:
item = requests.get(
f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
).json()
results.append(item)
df = pd.DataFrame(results)
df.to_csv("hackernews_top_stories.csv")
context.add_output_metadata(metadata={
"num_records": len(df),
"preview": MetadataValue.md(df[["title", "by", "url"]].to_markdown()),
})And the commands to run it:
pip install dagster dagster-webserver pandas
dagster dev -f hello-dagster.pyThat's it. One file, two commands, immediate result.
Tutorial (1.6): A 7-part sequential tutorial ("New to Dagster? Start here!"):
- Intro to assets (what/why)
- Setup (scaffold from example)
- Writing your first asset
- Building an asset graph
- Scheduling
- Saving data with I/O managers
- Managing external services (resources)
The tutorial introduced concepts incrementally: start with one asset, then dependencies, then scheduling, then I/O patterns, then external integrations. Each part built on the previous one.
Concept introduction order (1.6):
@asset(first thing a user sees)deps=[]/ asset dependenciesMetadataValue/ output metadataDefinitions(implied in project creation)- Schedules
- I/O managers
- Resources
SourceAsset,AssetIn,multi_asset(concepts layer)@op,@graph,@job(concepts layer, clearly secondary)
Best practices (1.6): Best practices were embedded within the Guides section, not given a dedicated top-level page. The key best-practices content was:
- Recommended Project Structure — the most prescriptive structural guidance, with full file tree examples based on the
project_fully_featuredexample - Automating Your Pipelines
- Building/Managing ML Pipelines
- Exploring a Dagster Project
The project structure guide recommended: assets in assets/ by business domain, resources in resources/, sensors in sensors/, and "we don't recommend over-abstracting too early; in most cases, one code location should be sufficient."
The 1.6 recommended project tree (from the fully-featured example):
project_fully_featured/
├── project_fully_featured/
│ ├── __init__.py
│ ├── assets/
│ │ ├── activity_analytics/
│ │ ├── core/
│ │ └── recommender/
│ ├── resources/
│ ├── sensors/
│ └── jobs.py
├── dbt_project/
├── setup.py
└── pyproject.toml
The current auto-generated project tree:
my-project/
├── src/
│ └── my_project/
│ ├── __init__.py
│ ├── definitions.py ← auto-discovery
│ └── defs/ ← all definitions go here
│ └── __init__.py
├── tests/
├── pyproject.toml
└── uv.lock
The difference is structural: 1.6 showed you a recommended layout for an already-complex project. Current gives you an auto-generated skeleton for a new project, with the expectation that dg scaffold commands will populate it.
Entry point: The Getting Started section directs users to Installation → Quickstart in a linear sequence.
Quickstart (current):
uvx create-dagster@latest project my-project- Navigate into the project
dg scaffold defs dagster.asset my_defs/my_asset.py- Write an asset (CSV processing example)
dg check defs(validate)dg dev(run)
This is more steps, more tooling, more structure from the start. The user gets a full project scaffold before writing any code. The dg CLI validates definitions before running, introducing a compile-check-run cycle.
The exact quickstart sequence (current) — the first thing a new user does:
uvx create-dagster@latest project dagster-quickstart
cd dagster-quickstart
source .venv/bin/activate
uv add pandas
dg scaffold defs dagster.asset assets.pyThis generates a project tree:
dagster-quickstart/
├── pyproject.toml
├── src/
│ └── dagster_quickstart/
│ ├── __init__.py
│ ├── definitions.py ← auto-discovery, "you should not need to modify"
│ └── defs/
│ └── assets.py ← scaffolded by `dg scaffold`
├── tests/
└── uv.lock
Then the user fills in the asset:
import pandas as pd
import dagster as dg
@dg.asset
def processed_data():
df = pd.read_csv("src/dagster_quickstart/defs/data/sample_data.csv")
df["age_group"] = pd.cut(
df["age"], bins=[0, 30, 40, 100], labels=["Young", "Middle", "Senior"]
)
df.to_csv("src/dagster_quickstart/defs/data/processed_data.csv", index=False)
return "Data loaded successfully"And validates + runs:
dg check defs # validates before running
dg dev # starts the development server
dg launch --assets "*" # materializes all assetsNote: The definitions.py file uses auto-discovery and is never touched by the user:
from pathlib import Path
from dagster import definitions, load_from_defs_folder
@definitions
def defs():
return load_from_defs_folder(project_root=Path(__file__).parent.parent.parent)The contrast is stark: 1.6 started with a concept (what is an asset?), current starts with a workflow (scaffold, write, check, run).
Dagster Basics Tutorial (current): A progressive tutorial covering:
- Creating a project with
create-dagster - Defining assets with
@dg.asset - Asset dependencies via
deps - Building the asset graph (DAG)
- Resources
- Asset checks (data quality)
- Automation (schedules)
- Custom components (introduced at the end)
ETL Pipeline Tutorial (current): A more advanced tutorial built entirely around Components:
- Extract data using Sling (via
SlingReplicationCollectionComponent) - Transform data using dbt (via
DbtProjectComponent) - Automate with schedules + declarative automation
- Visualize data
Concept introduction order (current):
create-dagsterproject scaffold (before any code!)@dg.assetdecoratordeps(dependencies)- Auto-discovery via
definitions.py - Resources
- Asset checks
- Schedules
AutomationCondition(declarative automation)- Components (
DbtProjectComponent, etc.) defs.yamlconfiguration- Custom components (inheriting from
Component)
Best practices (current): Best practices are now delivered in two ways — partially through an explicit Best Practices section under Guides, and partially baked into the prescribed workflows:
Explicit best practices section (/guides/best-practices/):
- Structuring your Dagster project
- Building ML pipelines
- Managing ML models
- Fully-featured project example
- Limiting concurrency
- Customizing run queue priority
- Validating data with Dagster Type factories
- Asset versioning and caching
Implicit best practices (embedded in tooling/workflows):
- Project structure: Automatically generated by
create-dagster. Two organization strategies offered: by technology (defs/dbt/,defs/sling/) or by concept (defs/ingestion/,defs/transformation/). - Configuration: Use
defs.yamlfor component configuration; use directory-leveldefs.yamlfor attribute inheritance (owner, tags). - Automation: Declarative Automation is the forward-looking choice;
@multi_asset_sensoris deprecated. - Integrations: Use Components rather than hand-coding asset definitions.
- Validation: Use
dg check defsbefore running. - Scaling guidance: Phase 1 (0-400 lines) keep everything in one file; Phase 2 (400+ lines) split into modules under
defs/; Phase 3 (multiple teams) organize by technology or concept.
| Aspect | 1.6 Onboarding | Current Onboarding |
|---|---|---|
| First interaction | Single Python file, dagster dev -f file.py |
Full project scaffold, multiple CLI commands |
| Time to first asset | ~2 minutes (copy-paste one file) | ~5 minutes (scaffold project, scaffold defs, write code) |
| Concept load at start | Low (just @asset and deps) |
Higher (project structure, dg CLI, definitions.py, defs/ folder) |
| "Magic" vs transparency | Transparent: you write Python, you see results | More magic: auto-discovery, YAML config, generated scaffolds |
| Minimum viable knowledge | What an @asset decorator does |
What create-dagster, dg scaffold, dg check, and dg dev do |
| When Components appear | Never (didn't exist) | End of basics tutorial, central to ETL tutorial |
| When ops/graphs appear | Concepts layer, with "recommended for some cases" | Essentially never in onboarding |
| Learning philosophy | "Learn the primitive, then compose" | "Learn the workflow, then understand the internals" |
| Best practices delivery | Explicit guide with file tree examples | Implicit in prescribed project structure |
In 1.6, a new user learned:
"Dagster is about assets — things that exist in your data world. You describe them with Python functions. They have dependencies on each other. You can materialize them. Everything else builds on this."
In the current docs, a new user learns:
"Dagster has a CLI toolchain. You scaffold a project. You scaffold definitions. You validate them. You run them. Assets are the core thing you define, but there's a whole framework around how you define and organize them."
The 1.6 approach was concept-first: understand what an asset is, then learn the tooling. The current approach is workflow-first: learn the tooling, then understand what you're building.
-
Validation before execution: The
dg check defsstep catches configuration errors before runtime, which is a significant developer experience improvement. -
Standardized project structure: Auto-generated scaffolds reduce decision fatigue and ensure consistency across teams and projects.
-
Components as force multiplier: For integration-heavy workloads (dbt, Sling, Fivetran, etc.), Components dramatically reduce boilerplate. A pipeline that might take hours of custom Python can be scaffolded in minutes.
-
Declarative Automation maturity: Moving from experimental Auto-Materialize Policies to stable, composable
AutomationConditionprimitives is a significant upgrade for production use. -
Deployment workflow:
dg scaffold build-artifactsgenerating Dockerfiles and build configs is a meaningful step toward deployment best practices being built into the toolchain.
-
Higher initial complexity: The 1.6 "single file" quickstart was arguably more accessible. Current onboarding requires understanding
create-dagster,dgCLI subcommands, project structure conventions, and auto-discovery before writing any data logic. -
Reduced flexibility signaling: By removing visible alternatives from onboarding, the docs may make users with non-standard use cases (non-asset workloads, task-based workflows, etc.) feel unsupported, even if those features still exist. Partial counterbalance (see Part 7): The 9 mini-examples explicitly present multiple approaches with trade-offs at the operational level (how to backfill, how to parallelize, partitions vs config). However, this flexibility lives outside the onboarding funnel — users must already be building before they discover it.
-
YAML configuration trade-offs: The shift from pure Python (
Definitions(...)) to YAML (defs.yaml) introduces a different kind of complexity. YAML with Jinja2 templating can become its own debugging challenge. -
Component abstraction gap: The jump from "here's what an asset is" (basics tutorial) to "here's a
DbtProjectComponentthat generates assets from YAML config" (ETL tutorial) is a large conceptual leap. Users may not understand what Components do under the hood. -
Implicit best practices: In 1.6, the recommended project structure guide was explicit about why each recommendation existed. In the current docs, best practices are embedded in auto-generated scaffolds, which users may follow without understanding the reasoning.
Dagster 1.6: "Assets are the right way. Here's why. But we support other patterns."
Opinionation: ████████░░ (8/10)
Current: Varies by doc layer:
Onboarding/Tutorials: ██████████ (10/10 — single prescribed path)
Full Pipeline Examples: █████████░ (9/10 — one way per tutorial)
Reference Architectures:████████░░ (8/10 — prescribe tech stacks)
Mini Examples: █████░░░░░ (5/10 — multiple approaches with trade-offs)
Reference Docs: ███████░░░ (7/10 — options documented, defaults noted)
The docs have gone from strongly opinionated with acknowledged alternatives to a gradient: highly prescriptive in onboarding and tutorials, but with a deliberate "flexibility layer" in the mini-examples (see Part 7). The key difference from 1.6 is where flexibility appears. In 1.6, alternatives were presented at the foundational level (assets vs ops, different dependency patterns). In the current docs, the what to build with is prescribed (Components, dg CLI, @dg.asset), while the how to tune it allows choice (backfill strategies, parallelism approaches, partitions vs config). This is a deliberate strategy — reducing foundational choices reduces confusion for new users — but it changes the character of the documentation from "here's how to think about data engineering" to "here's how to use Dagster (and here are some operational knobs once you're using it)."
To go beyond qualitative assessment, we systematically cataloged every instance of prescriptive guidance across comparable pages in both doc versions. This analysis examines Getting Started, Tutorials, key Concepts pages, and Guides — the pages a new or intermediate user would encounter.
For each page, we identified instances where the docs:
- Prescribe a specific approach: "you should", "we recommend", "the best way", "the easiest way"
- Offer a single path: Step-by-step instructions with no alternatives mentioned
- Deprecate/discourage: "deprecated", "legacy", "avoid", "instead use"
- Present a default choice: One option marked as default or recommended
- Offer alternatives (1.6 only): "you can also", "alternatively", "another approach"
- Present choices without strong opinion (1.6 only): Multiple options, no clear favorite
| Category | Dagster 1.6 (15 pages) | Current (18 pages) |
|---|---|---|
| Prescribe a specific approach | 37 | ~62 |
| Offer a single path (no alternatives) | 0 (alternatives always noted) | ~19 |
| Deprecate/discourage | 0 | ~5 |
| Present a default choice | 0 (implicit only) | ~14 |
| Offer alternatives | 13 | ~0 in onboarding; ~15-20 in mini-examples (see Part 7) |
| Present choices without strong opinion | 5 | ~0 in onboarding; ~9 in mini-examples (see Part 7) |
| TOTAL prescriptive instances | 37 | ~100 |
The current docs contain roughly 2.7x more prescriptive guidance instances across a comparable set of pages, even before normalizing for the fact that current docs also contain more pages overall.
Dagster 1.6 prescriptive language was concentrated in two areas:
- Project structure guide (11 of 37 instances) — the most opinionated single page, repeatedly using "we recommend" for directory layout
- Assets concept page (4 instances) — recommending
@assetas "the easiest way" andload_assets_from_package_moduleas the "(recommended)" grouping approach
Current docs prescriptive language is spread across every page:
- Every onboarding page has 5-9 prescriptive instances
- The ETL tutorial pages each have 5-7 instances
- The guides carry 6-9 instances per page
- The best practices hub alone has 9 instances
1.6 balanced prescriptions with alternatives. For 37 prescriptive instances, there were 13 instances of "you can also" and 5 instances of neutral multi-option presentation. That's a 2.1:1 prescription-to-alternative ratio.
Current docs offer essentially zero alternatives in onboarding. For ~100 prescriptive instances in Getting Started, Basics Tutorial, and ETL Tutorial, alternatives appear only in deep reference pages (e.g., pip as a fallback for uv, --format python as an alternative to YAML components). In the tutorials and getting started flow, the ratio is effectively infinite — prescription without acknowledged alternatives. However (see Part 7): The 9 mini-examples are a notable exception — they explicitly present 2-3 approaches with trade-offs for operational decisions (backfill strategies, parallelism, partitions vs config). This creates a two-tier prescription model: foundational choices (what tools, what abstractions) are prescribed; operational choices (how to tune, how to scale) allow flexibility. The combined picture is closer to a 5:1 prescription-to-alternative ratio across the full doc surface, though a new user following the onboarding funnel would never encounter the alternatives until reaching mini-examples.
In 1.6, prescriptions were primarily conceptual:
- "Assets are the main way to create data pipelines"
- "We recommend starting with assets and not worrying about ops"
- "Resources are the recommended way to manage connections"
- "We don't recommend over-abstracting too early"
These tell users what to think about — which abstraction to favor, which pattern to adopt.
Verbatim 1.6 prescription examples:
"We recommend placing your assets in your
assets/directory, with subdirectories for different business-relevant groupings." —/guides/dagster/recommended-project-structure
"Resources are the recommended way to manage connections to external services and configuration." —
/tutorial/connecting-to-external-services
"We don't recommend over-abstracting too early; in most cases, one code location should be sufficient." —
/guides/dagster/recommended-project-structure
Each was paired with explanation of why. And the project structure guide offered an alternative:
"If you'd prefer to keep things simple, you can start with a single module and refactor later."
In the current docs, prescriptions are primarily operational:
- "Use
uvx create-dagster@latest project" - "Use
dg scaffold defs dagster.asset" - "Use
dg check defsbefore running" - "Components provide a low-code interface in YAML"
- "Use
dg launch --assets '*'"
These tell users what to do — which command to run, which tool to use, which format to write config in.
Verbatim current prescription examples:
"Use the
dg scaffold defscommand to generate an assets file on the command line." —/getting-started/quickstart(no manual file creation alternative mentioned)
"Open your terminal and scaffold a new Dagster project:
uvx create-dagster@latest project dagster-quickstart" —/getting-started/quickstart(create-dagsteris the only path shown)
"In the terminal, navigate to your project's root directory and run:
dg dev" —/getting-started/quickstart(olderdagster devnot mentioned)
"We recommend beginning new components by designing the interface." —
/guides/build/components
"We recommend using asset observations when reporting events from external systems in Dagster instead of asset materializations to avoid consuming credits." —
/guides/build/assets
| 1.6 | Current | |
|---|---|---|
| Explicit deprecation notices | 0 | ~5 |
| APIs marked for removal | 0 | SourceAsset (→ AssetSpec), AutoMaterializePolicy (→ AutomationCondition), AutoMaterializeRule, @multi_asset_sensor |
| Discouraged patterns | 0 | Ops for new projects, asset sensors for automation, unit testing external system logic, hard-coding credentials |
The 1.6 docs contained zero deprecation or discouragement language. Even ops were described as "the core unit of computation" — just placed under an Advanced section. The current docs actively deprecate multiple APIs and discourage several patterns.
Verbatim deprecation examples from current docs:
"AutoMaterializePolicy, AutoMaterializeRule, and the auto_materialize_policy arguments to @asset and AssetSpec have been marked as deprecated, and the new AutomationCondition API and automation_condition argument should be used instead." —
/migration/upgrading(1.8.0 release notes)
"SourceAsset is deprecated, in favor of AssetSpec. You can now use AssetSpecs in any of the places you could previously use SourceAssets." —
/migration/upgrading(1.8.0 release notes)
"The experimental @multi_asset_sensor has been marked as deprecated, but will not be removed from the codebase until Dagster 2.0 is released." —
/migration/upgrading(1.9.0 release notes)
"FreshnessPolicy is now deprecated. For monitoring freshness, use freshness checks instead." —
/migration/upgrading(1.7.0 release notes)
Compare with how 1.6 described ops:
"An op is the core unit of computation in Dagster. Individual ops should perform relatively simple tasks." —
/concepts/ops-jobs-graphs/ops(no deprecation, just positioning as "advanced")
| Example Type | Dagster 1.6 (15 pages) | Current (18 pages) | Change |
|---|---|---|---|
| Full runnable examples (copy-paste and run) | 5 | ~27 | +440% |
| Code snippets (fragments showing a concept) | 49 | ~37 | -24% |
| CLI commands | 13 | ~42 | +223% |
| Configuration examples (YAML, directory trees) | 5 | ~20 | +300% |
| GRAND TOTAL (core pages) | 72 | ~126 | +75% |
Note (updated per Part 7): The counts above cover the 18 core pages (Getting Started, Basics Tutorial, ETL Tutorial, Concepts, Guides). They do not include the 9 full-pipeline tutorial examples, which collectively contain an estimated ~200+ additional code blocks (Python, YAML, CLI, config) across ~36 sub-pages. Including these brings the current docs' total hands-on code surface to ~326+ code blocks — roughly 4.5x the 1.6 total. The full pipeline examples are the single largest source of runnable code in the current docs and represent a category that had no equivalent in 1.6.
The numbers tell a clear story about how the docs' teaching philosophy changed:
1. Full runnable examples increased 5x. The current docs invest heavily in code you can actually copy-paste and run. In 1.6, most tutorial code was fragments requiring prior context (49 snippets vs 5 full examples). The current docs flip that ratio — more complete examples than fragments.
2. CLI commands tripled. This reflects the new dg CLI-centric workflow. In 1.6, you ran dagster dev and maybe dagster project scaffold. In the current docs, every tutorial step involves a dg subcommand: dg scaffold defs, dg check defs, dg dev, dg launch, dg scaffold build-artifacts, plus installation commands for uv, create-dagster, etc.
3. Configuration examples quadrupled. This reflects the shift to YAML-based Components. In 1.6, configuration was pure Python — no YAML files to show. The current docs have defs.yaml examples, pyproject.toml structures, and directory tree layouts on nearly every page.
4. Code snippets slightly decreased. Despite having more total code blocks, the current docs have fewer fragment snippets. This is because the docs moved toward showing complete, runnable code rather than illustrative fragments. This is a pedagogical improvement — users can actually run what they see.
Dagster 1.6 — code-heavy pages:
| Page | Total Code Blocks |
|---|---|
/concepts/assets/software-defined-assets |
20 (highest) |
/concepts/ops-jobs-graphs/ops |
11 |
/tutorial/saving-your-data |
8 |
| All other pages | 0-5 each |
Current — code-heavy pages:
| Page | Total Code Blocks |
|---|---|
/guides/build/components |
~12 (highest) |
/getting-started/quickstart |
~11 |
/guides/build/assets |
~11 |
/tutorial/components |
~9 |
/etl-pipeline-tutorial/extract |
~9 |
| All other pages | 4-8 each |
In 1.6, code density was concentrated in the concepts layer (the SDA page alone had 20 blocks). In the current docs, code is more evenly distributed across getting started, tutorials, and guides — reflecting the linear funnel approach where every step has executable code.
In 1.6, the highest-code-density page (concepts/assets/software-defined-assets) showed many ways to accomplish things:
- Basic deps, managed-loading deps, explicit
AssetIndeps,SourceAssetdeps - Single output, multi-output, conditional materialization
- Group assignment via decorator vs
load_assets_from_package_module - Config via decorator, via run config, via factory
This was a reference encyclopedia — "here are all the patterns, pick what fits."
Concrete 1.6 example — four different ways to declare dependencies on the same SDA page:
Pattern A — deps list:
@asset(deps=[sugary_cereals])
def shopping_list() -> None:
execute_query("CREATE TABLE shopping_list AS SELECT * FROM sugary_cereals")Pattern B — managed-loading (function argument):
@asset
def downstream_asset(upstream_asset):
return upstream_asset + [4]Pattern C — explicit AssetIn:
@asset(ins={"upstream": AssetIn("upstream_asset")})
def downstream_asset(upstream):
return upstream + [4]Pattern D — SourceAsset for external data:
my_source_asset = SourceAsset(key=AssetKey("a_source_asset"))
@asset(deps=[my_source_asset])
def my_derived_asset():
return execute_query("SELECT * from a_source_asset").as_list() + [4]All four patterns were shown on the same page, giving users the power to choose.
In the current docs, the highest-code-density pages show one way to accomplish each thing:
- One way to scaffold (
dg scaffold defs) - One way to define assets (
@dg.asset) - One way to configure components (
defs.yaml) - One way to automate (
AutomationCondition)
This is a recipe book — "here's the recipe, follow the steps."
Concrete current example — one way to define and configure a dbt component:
Step 1: scaffold with a CLI command:
dg scaffold defs dagster_dbt.DbtProjectComponent transform --project-path transform/jdbtStep 2: configure via YAML (the only mechanism shown):
type: dagster_dbt.DbtProjectComponent
attributes:
project: '{{ context.project_root }}/transform/jdbt'
translation:
key: "target/main/{{ node.name }}"Step 3: validate + run:
dg check defs
dg devThe older @dbt_assets Python decorator approach is not mentioned on this tutorial page.
Concrete current example — the only automation pattern shown in the tutorial:
import dagster as dg
@dg.asset(
deps=["upstream"],
automation_condition=dg.AutomationCondition.on_cron("@hourly"),
)
def hourly_asset() -> None: ...Sensors, custom cron logic, and the older AutoMaterializePolicy are not mentioned. The recipe is singular.
Concrete current example — the Component class that replaces hand-coded assets:
In the basics tutorial, three nearly-identical hand-coded Python asset functions get replaced by one Component:
class Tutorial(dg.Component, dg.Model, dg.Resolvable):
duckdb_database: str
etl_steps: list[ETL]
def build_defs(self, context: dg.ComponentLoadContext) -> dg.Definitions:
_etl_assets = []
for etl in self.etl_steps:
@dg.asset(name=etl.table)
def _table(duckdb: DuckDBResource):
with duckdb.get_connection() as conn:
conn.execute(f"create or replace table {etl.table} as ...")
_etl_assets.append(_table)
return dg.Definitions(assets=_etl_assets, resources={...})Configured via YAML:
type: dagster_tutorial.components.tutorial.Tutorial
attributes:
duckdb_database: /tmp/jaffle_platform.duckdb
etl_steps:
- url_path: https://...raw_customers.csv
table: customers
- url_path: https://...raw_orders.csv
table: orders
- url_path: https://...raw_payments.csv
table: paymentsThis is positioned as the natural evolution: "In this tutorial, you will learn about core Dagster features and use them to build a working data pipeline. We will start with the fundamental concepts and progress to higher level abstractions that showcase the power of Dagster." (Basics Tutorial intro)
The examples ecosystem is one of the most dramatically changed areas between 1.6 and the current docs. In 1.6, examples were CLI-extractable project templates in the monorepo. In the current docs, examples have been elevated to a first-class documentation tier with multi-page tutorials, reference architectures, and domain-specific full pipelines.
In 1.6, the primary mechanism was:
dagster project from-example --example <name> --name <my-project>
dagster project list-examples # discover available examplesAll examples lived in the dagster-io/dagster monorepo under /examples/. There were 27 directories (24 user-facing) organized into these categories:
| Category | Count | Key Examples |
|---|---|---|
| Quickstarts | 4 | quickstart_etl (local), quickstart_aws (S3), quickstart_gcp (BigQuery), quickstart_snowflake |
| Reference architecture | 1 | project_fully_featured (HN analytics + ML + dbt, [UNMAINTAINED]) |
| Tutorial companions | 3 | tutorial, tutorial_notebook_assets, project_dagster_university_start |
| Asset pattern examples | 5 | assets_dbt_python, assets_modern_data_stack, assets_dynamic_partitions, etc. |
| Feature demonstrations | 4 | assets_smoke_test, feature_graph_backed_assets, development_to_production, etc. |
| Integration examples | 5 | with_airflow, with_great_expectations, with_wandb, with_pyspark, with_pyspark_emr |
| Deployment examples | 3 | deploy_docker, deploy_ecs, deploy_k8s |
Key characteristics of 1.6 examples:
-
All were self-contained project scaffolds — you ran
dagster project from-exampleand got a working project directory withsetup.py, assets, resources, schedules, and tests. -
The "fully featured" reference was UNMAINTAINED —
project_fully_featured, the flagship reference architecture (HN activity data, ML recommender, dbt analytics, multi-environment deploy), was already marked[UNMAINTAINED]in 1.6, with users directed to the "Dagster Open Platform" repo. -
Examples showed diverse patterns — the collection included ops-based patterns (
with_pyspark,with_great_expectations), asset-based patterns (assets_dbt_python), and mixed patterns. There was no single "right way" enforced. -
Quickstarts were interchangeable — all four quickstarts followed the same HackerNews ETL pattern but with different storage backends (local, S3, BigQuery, Snowflake), letting users pick their cloud.
Verbatim from the 1.6 "Create New Project" page:
"You can also generate a Dagster project from an official Dagster example, which is useful for learning:
dagster project from-example --name my-dagster-project --example quickstart_etl"
This positioned examples as a learning aid — something to copy from, not a reference to follow exactly.
The current docs organize examples into a three-tier hierarchy that didn't exist in 1.6:
These are the standout addition. Each is a guided, multi-page tutorial building a complete real-world pipeline:
| # | Tutorial | Domain | Tech Stack | Uses Components? | Pages |
|---|---|---|---|---|---|
| 1 | ETL Pipeline | Data engineering | DuckDB + Sling + dbt + Evidence | Yes (DbtProjectComponent, Sling component) |
~8 |
| 2 | Dagster + dbt | Analytics | dbt + DuckDB + DbtProjectComponent |
Yes | ~5 |
| 3 | Bluesky Analytics | Social media analytics | Bluesky API + Cloudflare R2 + dbt + Power BI | Mixed (custom resource + BI integration) | ~4 |
| 4 | RAG with Pinecone | AI/ML | GitHub API + OpenAI + Pinecone | No (standard Python assets) | ~4 |
| 5 | Podcast Transcription | AI/ML | RSS + Modal (serverless) + Dagster Pipes | No (Pipes + factory pattern) | ~3 |
| 6 | Prompt Engineering | AI/ML | Anthropic Claude + NREL API + Pydantic | No (standard assets) | ~2 |
| 7 | LLM Fine-Tuning | AI/ML | Goodreads dataset + DuckDB + OpenAI | No (standard assets) | ~4 |
| 8 | DSPy Puzzle Solving | AI/ML | DSPy + MIPROv2 + custom DSPyModelBuilder |
Yes (custom component authoring) | ~3 |
| 9 | ML Pipeline (PyTorch) | ML | MNIST + PyTorch CNN + batch inference | No (standard assets) | ~3 |
Key observation: 4 of 9 full pipelines are AI/ML focused, reflecting Dagster's strategic push into AI orchestration. In 1.6, there was only 1 ML-related example (project_fully_featured's recommender, which was unmaintained).
These are entirely new — conceptual architecture diagrams with integration patterns, not runnable code:
| # | Architecture | Pattern | Key Technologies |
|---|---|---|---|
| 1 | ETL / Reverse ETL | Salesforce → Fivetran → Snowflake → dbt → Hightouch → Salesforce | FivetranAccountComponent |
| 2 | BI (Business Intelligence) | Shopify/Postgres → Airbyte → warehouse → dbt → BI tools | AirbyteWorkspaceComponent |
| 3 | RAG | GitHub GraphQL → embeddings (OpenAI) → vector DB | Standard Python patterns |
| 4 | Real-Time System | dlt extraction → ClickHouse, Kafka events → materialized views | dlt integration |
Reference architectures had no equivalent in 1.6. The closest thing was project_fully_featured, which was a runnable project, not an architectural pattern. Additionally, the hooli-data-eng-pipelines GitHub repo serves as a more advanced reference architecture (multi-project workspace, Dagster+ Hybrid K8s deployment, CI/CD with GitHub Actions, RBAC), but it's not prominently linked from the new docs examples section.
This is a particularly interesting addition. Mini examples are the one area of the current docs that explicitly presents multiple approaches with trade-offs — a style much closer to the 1.6 "here are your options" philosophy. They are described as "focused, pattern-based guides for common Dagster use cases and architectural decisions."
| # | Mini Example | What It Explores | Approaches Compared |
|---|---|---|---|
| 1 | Dynamic Fanout | Processing variable numbers of items in parallel | Dynamic outputs vs static |
| 2 | Dynamic Outputs vs Python Parallelism | Parallelism strategies | Dagster dynamic outputs vs concurrent.futures — trade-offs in observability, performance, resource consumption |
| 3 | Asset Health Monitoring | Monitoring critical Tier-0 assets | Materialization tracking, data quality checks, freshness policies |
| 4 | Resource Caching | Caching expensive operations in resources | API calls, DB queries, heavy computations |
| 5 | Sharing Code Across Code Locations | DRY code across multiple code locations | Shared modules, factories, helpers |
| 6 | Partition Backfill Strategies | Three backfill strategies | One-run-per-partition vs batched vs single-run (BackfillPolicy) — trade-offs in overhead, fault isolation, resource utilization |
| 7 | Partitions vs Config | Parameterizing pipelines | Partitions vs run configuration — trade-offs in tracking, observability, workflow |
| 8 | PII Redaction in Compute Logs | Automatic PII redaction | Custom compute log manager approaches |
| 9 | Lambda Run Launcher | Serverless run execution | AWS Lambda instead of ECS/containers for lightweight jobs |
Why this matters for the analysis: Mini examples are the counterbalance to the prescriptive tutorials. While the Full Pipeline tutorials say "do it this way," mini examples say "here are 2-3 approaches, each with distinct trade-offs in X, Y, and Z." This is the closest the current docs get to the 1.6 "encyclopedia of patterns" style — but mini examples are scoped to operational decisions (how to backfill, how to parallelize), not to fundamental abstractions (assets vs ops). The what to build with is prescribed; the how to tune it allows choice.
Notably, mini examples do not use Components or the dg CLI — they focus on core Python-level Dagster patterns. This reinforces that Components are the prescribed path for building, while Python patterns are the flexible layer for optimization.
The /examples/ directory in the monorepo now contains ~49 directories (up from 27 in 1.6):
| Category | Count | New Since 1.6 |
|---|---|---|
Full pipeline companions (docs_projects/) |
~9 | All new |
Airflow migration (airlift-*, starlift-*) |
4 | All new |
| Components-era examples | 3 | All new (components_yaml_checks_dsl, ingestion-patterns, data-quality-patterns) |
| Domain examples | 3 | snowflake_cortex, google_drive_factory, with_openai |
| Legacy/unmaintained | ~15 | Carried over from 1.6 |
| Deployment | 3 | Carried over |
Internal (docs_snippets, experimental) |
3 | Expanded |
| Dimension | 1.6 Examples | Current Examples |
|---|---|---|
| Total count | 24 user-facing | ~56 (18 on docs site + ~33 on GitHub + ~5 experimental) |
| Discovery | dagster project list-examples CLI |
Browsable on docs site at /examples/ |
| Delivery mechanism | dagster project from-example scaffold |
Multi-page tutorial on docs site |
| Reference architectures | 1 (unmaintained) | 4 (new category, aspirational not runnable) |
| Full pipeline tutorials | 0 (tutorial was concept-focused) | 9 multi-page walkthroughs (~36 sub-pages) |
| Mini examples | 0 | 9 (pattern-focused, trade-off-oriented) |
| AI/ML coverage | 1 (recommender, unmaintained) | 5 full pipelines (RAG, Fine-Tuning, DSPy, PyTorch, Prompt Engineering) |
| Components-era examples | 0 | ~7 use Components/dg CLI |
| Airflow migration | 1 (with_airflow) |
4 dedicated examples (airlift-*, starlift-*) |
| Unmaintained examples | ~9 | ~15 (carried over + newly unmaintained) |
| Quickstart examples | 4 (one per cloud) | Replaced by create-dagster quickstart |
| Shows multiple approaches | Most examples showed one approach | Mini examples show 2-3 approaches with trade-offs; full pipelines show one |
The examples evolution reinforces the themes from earlier sections — but with one important nuance provided by the mini-examples:
1. From "copy and adapt" to "follow the tutorial."
In 1.6, examples were templates: you scaffolded a project from an example and modified it. The example itself was a starting point, and users were expected to understand and change the code. The quickstart_etl README said:
"The purpose of this project is to provide a starting point for your Dagster pipelines."
In the current docs, examples are guided walkthroughs: each has 3-8 pages of step-by-step instructions with exact commands, exact YAML, and exact code to paste. Users follow the tutorial as written. The ETL Pipeline tutorial introduction says:
"In this tutorial, you will build a data pipeline that extracts data from files into DuckDB, transforms it with dbt, and visualizes the results."
The 1.6 framing gives ownership ("your pipelines"). The current framing describes an outcome ("you will build").
2. Reference architectures are a new opinionation tool.
The four reference architectures (ETL/Reverse ETL, BI, RAG, Real-Time) are inherently opinionated — they prescribe which tools to use in combination (Fivetran + Snowflake + dbt + Hightouch, or Airbyte + dbt + BI tools). In 1.6, there was no equivalent guidance on technology combinations. The reference architectures answer a question 1.6 never addressed: "What should my overall data platform look like?"
3. The AI/ML pivot is visible in examples first.
5 of the 9 full-pipeline tutorials are AI/ML focused (RAG, Prompt Engineering, LLM Fine-Tuning, DSPy, ML Pipeline). This represents a major strategic shift that's more visible in the examples than in the core docs. In 1.6, AI/ML was an afterthought (one unmaintained recommender model). In the current docs, AI/ML is nearly half the showcase material.
The examples section fits into the analysis framework as follows:
| Analysis Category | How Examples Contribute |
|---|---|
| Opinionation (Part 2) | Reference architectures prescribe technology stacks. Full pipeline tutorials prescribe exact workflows. This is a significant increase in opinionation through examples. |
| Onboarding (Part 3) | The ETL Pipeline tutorial is now a core part of the onboarding funnel (it follows the Basics Tutorial). In 1.6, examples were optional side-resources. This makes examples a primary onboarding tool rather than supplementary material. |
| Prescriptive guidance (Part 5) | Every full-pipeline tutorial page has 5-7 prescriptive instances. With 9 tutorials averaging 4 pages each, that's ~36 pages of prescriptive example content that didn't exist in 1.6. |
| Hands-on code (Part 6) | The 9 full-pipeline tutorials collectively contain ~200+ code blocks (Python, YAML, CLI, config). This is the single largest source of runnable code in the current docs and triples the total hands-on example surface area vs 1.6. |
| Best practices (Part 3.2) | Reference architectures implicitly communicate "this is how a Dagster data platform should be structured" — a form of best-practice guidance that goes beyond individual project structure to platform architecture. |
| Counterpoint to opinionation | Mini examples are the one area where the current docs explicitly present multiple approaches with trade-offs. This creates a two-layer model: the tutorials/full pipelines say "build this way" while the mini examples say "once you're building, here are choices to consider." 1.6 had this flexibility at the foundational level (assets vs ops); the current docs push it up to the operational level (how to backfill, how to parallelize). |
The commercial offering's documentation underwent its own transformation — rebranded from "Dagster Cloud" to "Dagster+" in April 2024 (coinciding with Dagster 1.7), with significant structural, feature, and philosophical changes that both mirror and diverge from the OSS docs evolution analyzed in Parts 1-7.
Timeline:
- August 2022: Dagster Cloud originally launched
- April 17, 2024: Rebranded as Dagster+ ("the next evolution of Dagster Cloud"), announced alongside Dagster 1.7
- The company (Elementl) simultaneously renamed itself to Dagster Labs
The rebrand was not cosmetic — it accompanied four new capabilities: a new Asset Catalog (search-first with column lineage), expanded data quality features (anomaly detection, freshness, alerting), improved Branch Deployments with Change Tracking, and Insights GA for cost management across tools like Snowflake, Fivetran, and OpenAI.
Dagster Cloud in 1.6 had ~47-50 pages organized into 7 top-level sections:
Cloud
├── Getting Started
├── Deployment Types
│ ├── Serverless
│ └── Hybrid
│ ├── Agents Overview
│ ├── Local Agent
│ ├── Docker Agent (setup + config ref)
│ ├── Kubernetes Agent (setup + config ref)
│ └── Amazon ECS Agent (5 sub-pages: new VPC, existing VPC, manual, config ref, CloudFormation upgrade)
├── Organization Settings (tokens)
├── Authentication & Users
│ ├── User Management
│ ├── Teams (Enterprise)
│ ├── Roles & Permissions
│ └── SSO: Azure AD, Google Workspace, Okta, OneLogin, PingOne + SCIM
├── Managing Deployments
│ ├── Alerts (single page, 5 alert types)
│ ├── Environment Variables (UI + agent methods)
│ ├── Code Locations
│ ├── dagster_cloud.yaml Reference
│ ├── Deployment Settings
│ └── Branch Deployments (overview + GitHub, GitLab, CLI)
├── Insights (overview + Snowflake, Snowflake+dbt, export)
└── dagster-cloud CLI
Dagster+ in current docs has ~50-70 pages across a restructured, more distributed layout:
Dagster+ (also under Deployment → Dagster+)
├── About Dagster+
├── Getting Started
├── Deployment
│ ├── Serverless (runtime, CI/CD, settings)
│ ├── Hybrid
│ │ ├── Architecture Overview
│ │ ├── Amazon ECS Agent (setup + config ref + CloudFormation)
│ │ ├── Docker Agent (setup + config ref)
│ │ ├── Kubernetes Agent (setup + config ref)
│ │ ├── Microsoft Azure Agent ← NEW
│ │ └── Multiple Agents + Queue Routing
│ └── IP Addresses
├── Deploying Code
│ ├── Branch Deployments (GitHub, GitLab, CLI, multiple deployments)
│ ├── Full Deployments (settings, management)
│ └── CI/CD Configuration (GitHub Actions, GitLab CI, custom)
├── Management
│ ├── Environment Variables (UI, agent config, built-in vars)
│ ├── Tokens
│ └── Deployment Settings Reference
├── Authentication & Access Control
│ ├── SSO: Google, Okta, Azure AD/Entra ID, PingOne, GitHub ← GitHub SSO new
│ ├── SCIM: Okta, Microsoft Entra ID
│ ├── RBAC (overview, custom roles [Pro], per-deployment permissions)
│ ├── Teams (Pro)
│ ├── Service Users (Pro) ← NEW
│ └── Audit Logs (Pro) ← NEW
├── Features
│ ├── Insights (overview, BigQuery, Snowflake, custom metrics, export)
│ ├── Asset Catalog (catalog views, search) ← NEW
│ └── Asset Health Status ← NEW
├── Alerts (email, Slack, Microsoft Teams [NEW], PagerDuty [Pro])
│ ├── Alert Policy Types (expanded: asset health, metric monitors, code location errors)
│ └── YAML Alert Configuration ← NEW
├── Migration
│ ├── OSS → Dagster+ ← NEW
│ └── Serverless → Hybrid ← NEW
└── CLI
├── dg plus deploy
├── dg plus deploy configure ← NEW
├── dg plus login
└── dg scaffold github-actions ← NEW
| Feature Area | 1.6 (Dagster Cloud) | Current (Dagster+) | Change |
|---|---|---|---|
| Page count | ~47-50 | ~50-70 | +30-40% |
| Deployment models | Serverless + Hybrid | Serverless + Hybrid (same) | Unchanged |
| Agent types | ECS, Docker, K8s, Local | ECS, Docker, K8s, Azure (Local demoted) | +1 agent |
| SSO providers | Azure AD, Google, Okta, OneLogin, PingOne | Azure AD/Entra ID, Google, Okta, PingOne, GitHub | +1, -1 (OneLogin removed) |
| SCIM | Okta only | Okta + Microsoft Entra ID | +1 |
| Alerts | Single page, 5 alert types, 2 channels (Slack, email) | Multi-page, 8+ alert types, 4 channels (+Microsoft Teams, +PagerDuty) | 3x expansion |
| Alert configuration | UI only | UI + YAML config + dagster-cloud CLI sync |
+2 methods |
| Insights | 4 pages (Snowflake-focused) | 5-6 pages (+BigQuery, +custom metrics, cost tracking) | Expanded |
| Asset Catalog | Did not exist | First-class feature (catalog views, search, column lineage) | Entirely new |
| Asset Health | Did not exist | Combined health from materialization + freshness + checks | Entirely new |
| Branch deployments | 3 pages (overview, GitHub, GitLab) | 4-5 pages (+multiple deployments, +Change Tracking) | Enhanced |
| RBAC | Basic (5 default roles, Enterprise gating) | Enhanced (custom roles [Pro], per-code-location permissions) | Significant upgrade |
| Service users | Did not exist | Non-human API auth accounts (Pro) | Entirely new |
| Audit logs | Did not exist | Track and attribute changes (Pro) | Entirely new |
| Migration guides | None | OSS → Dagster+, Serverless → Hybrid | Entirely new |
| CI/CD tooling | dagster-cloud ci CLI + GitHub/GitLab guides |
dg scaffold github-actions, dg plus deploy configure |
More automated |
| Billing/pricing | Not documented | Referenced but still on dagster.io/pricing (not in docs) | Minimal change |
| EU region | Not available | dg plus login --region eu |
New |
| Compass (AI) | Did not exist | AI-powered data answers for stakeholders | Entirely new |
This is where the Dagster Cloud/+ docs diverge notably from the OSS docs pattern described in Parts 1-7.
1.6 Dagster Cloud was moderately opinionated (similar to OSS):
- Serverless was clearly recommended as the default: "Ideal for those wanting quick adoption without computational demands or infrastructure focus."
- But Hybrid was presented as a legitimate alternative with clear criteria: "Consider Hybrid deployment instead if you need substantial computational resources, handle datasets too large for memory, require multi-node computation distribution, or prefer not adding Dagster Labs as a data processor."
- Agent type selection was genuinely flexible — ECS, Docker, K8s, and Local were presented as peers.
- Branch deployments were prescribed as THE CI/CD model, but with GitHub, GitLab, and CLI as three equal paths.
Current Dagster+ docs are more opinionated, but not as drastically as OSS:
- "If you're not sure which agent to use, Dagster recommends the Dagster+ Kubernetes agent in most cases." — This is a new recommendation that 1.6 did not have.
- "Dagster strongly recommends creating a dedicated branch deployment agent" — Stronger language than 1.6.
- Components and
dgCLI are now described as "the recommended, production-ready defaults for new Dagster projects" — this flows from OSS into the cloud onboarding. - GitHub Actions is now auto-scaffolded (
dg scaffold github-actions), making it the de facto default over GitLab or custom CI.
However, the cloud docs maintain more genuine optionality than OSS tutorials because deployment infrastructure inherently requires choice — you can't prescribe a single agent type for all environments. The Serverless vs. Hybrid decision is still a real fork, and within Hybrid, each agent type has a legitimate use case tied to existing infrastructure.
Opinionation comparison:
1.6 Cloud Current Dagster+
Deployment model choice: ██████░░░░ ██████░░░░ (genuine choice, both well-documented)
Agent type selection: █████░░░░░ ███████░░░ (K8s now recommended; was neutral)
CI/CD setup: ██████░░░░ ████████░░ (GitHub Actions auto-scaffolded)
Project structure: ██████░░░░ █████████░ (follows OSS prescriptive shift)
Monitoring/Alerting: █████░░░░░ ███████░░░ (more alert types, but more channels)
Overall: ██████░░░░ ███████░░░ (6/10 → 7/10)
The gap is smaller than in OSS (6→7 vs. 8→10 in onboarding). This is because infrastructure documentation must accommodate different environments — you can't tell a Docker-only shop to use Kubernetes.
1.6 Cloud onboarding followed a 5-step linear flow:
- Create account and organization
- Select deployment type (Serverless vs. Hybrid)
- Deploy code (path diverges by deployment type)
- Configure CI/CD (Hybrid only)
- Set up environment variables
Next steps suggested: invite team, configure auth, set up monitoring, learn branch deployments. Users had a clear fork at step 2 — Serverless skipped step 4 entirely.
Current Dagster+ onboarding is more streamlined:
- Serverless: "follow the Dagster+ onboarding to add a new project. You will be guided through the steps to create a Git repository and set up the necessary CI/CD actions." Users can "select an example project or import an existing dbt project" if they don't have code.
- Hybrid: "follow the steps in the Dagster+ Hybrid documentation to install a Dagster+ Hybrid agent and set up CI/CD."
- A new Serverless Quickstart page provides a dedicated walkthrough: sign up → create project → make changes via branch deployments → merge to main for production deployment.
- The account is "automatically enrolled in a trial" — reducing friction.
Key onboarding difference: 1.6 had a single "Getting Started" page that covered both paths in one flow. The current docs have separated Serverless and Hybrid into distinct paths with dedicated pages, plus added a Serverless Quickstart that mirrors the OSS quickstart's prescriptive style.
Several entirely new product capabilities are documented in the current Dagster+ docs:
1. Asset Catalog — A search-first experience with catalog views, column lineage, and metadata search. "A powerful search-first experience that builds off of Dagster's best-in-class lineage graph." No equivalent in 1.6.
2. Asset Health Status — A unified health metric combining materialization state, freshness tracking, and asset check results into a single status indicator. No equivalent in 1.6.
3. Expanded Alert Types — 1.6 had 5 alert types (run failure, asset materialization, schedule/sensor, code location error, agent downtime). Current docs add: ASSET_HEALTH_DEGRADED, ASSET_HEALTH_WARNING, ASSET_HEALTH_HEALTHY, METRIC_MONITOR_ALERT, CODE_LOCATION_ERROR (expanded). Microsoft Teams is a new notification channel. YAML-based alert configuration is new.
4. Insights Expansion — 1.6 Insights focused on Snowflake cost tracking. Current Insights adds BigQuery, custom metrics from asset metadata, credit usage tracking per asset, and a Job Insights tab for per-job metric breakdowns.
5. Service Users — Non-human API authentication accounts for CI/CD and service-to-service integration. Unlimited on Pro. Did not exist in 1.6.
6. Audit Logs — Track and attribute all changes to the Dagster+ deployment. Pro only. Did not exist in 1.6.
7. Custom RBAC Roles — Beyond the 5 default roles (Viewer, Launcher, Editor, Admin, Org Admin), Pro users can now create custom roles with granular permissions.
8. Migration Guides — Two new guides: OSS → Dagster+ and Serverless → Hybrid. 1.6 had no migration documentation.
9. Compass (AI) — An AI-powered feature that "turns warehouse data into instant, trustworthy answers for every stakeholder." Entirely new product capability.
10. dg plus CLI — Replaces parts of dagster-cloud CLI with dg plus deploy, dg plus deploy configure, and dg plus login (with EU region support). dg scaffold github-actions auto-generates CI/CD workflows.
1.6 Cloud docs embedded best practices within individual pages with no dedicated best practices section:
- "Serverless works best for orchestration-heavy workloads, particularly those integrating third-party SaaS products like cloud data warehouses."
- "Only use multiple agents of the same type" (e.g., multiple K8s agents, not mixed types)
- "Server TTL" to reduce costs by auto-shutting down idle servers
- Recommended dedicated branch deployment agents separate from production
- No reference architecture for cloud deployments
Current Dagster+ docs are more explicit about best practices:
- Verbatim resource sizing guidance: "Agent container — start at 0.25 vCPU core and 1 GB RAM... Code server container — budget for imports, plus the definition graph... For runs: 4 vCPU cores, 8-16 GB of RAM."
- Security model documented: "SOC 2 Type II certified... encrypting data at rest (AES 256) and in transit (TLS 1.2+). All user code runs within your environment, in isolation from Dagster system code."
- Environment variable scoping guidance (Full vs. Branch vs. Local)
- K8s recommended as the default agent type (new recommendation)
- HIPAA, SOC 2 Type II, and GDPR compliance noted
The shift is from implicit best practices you'd derive from reading multiple pages to explicit sizing guides, security models, and compliance certifications that a platform team can reference directly.
| Dimension | 1.6 (Dagster Cloud) | Current (Dagster+) |
|---|---|---|
| Documented in docs? | No dedicated page | Referenced but detailed pricing lives on dagster.io/pricing |
| Tier names | Not documented (Enterprise referenced) | Solo, Starter (formerly Teams), Pro |
| Credit model | Not documented in detail | 1 credit = 1 asset materialization + 1 op execution; @asset costs ~2 credits |
| Free tier | Not documented | Solo: 7,500 free credits/month |
| Overage | Not documented | $0.03/credit; "Dagster+ will not shut down your pipelines due to usage" |
| Compute charges | Not documented | Serverless: per-minute compute; Hybrid: no compute charge |
| Enterprise features | Multiple deployments, Teams, Viewer/Launcher roles | Pro: Custom roles, SCIM, audit logs, PagerDuty, service users, per-code-location permissions |
The credit model documentation is notably not in the official docs — it's on the marketing site and in GitHub Discussions. This is one area where the docs are less transparent than users expect, and community feedback has noted the pricing as confusing.
The central finding from this analysis is that the Dagster+ docs did not undergo the same dramatic philosophical shift as the OSS docs:
| Dimension | OSS Docs Change (Parts 1-7) | Dagster+ Docs Change |
|---|---|---|
| Opinionation increase | Dramatic (8/10 → 10/10 in onboarding) | Moderate (6/10 → 7/10) |
| Alternative elimination | Severe (near-zero in onboarding) | Mild (still presents Serverless vs. Hybrid, multiple agents) |
| New abstractions | Components, dg CLI, defs.yaml (paradigm shift) |
dg plus CLI, auto-scaffolded CI/CD (tooling evolution) |
| Documentation restructuring | Complete rewrite (Docusaurus, linear funnel) | Restructured URL paths, but similar topic organization |
| Best practices delivery | Embedded in scaffolds (implicit) | More explicit (sizing guides, security docs) |
| Deprecation language | Aggressive (5+ APIs deprecated) | Minimal (CLI commands renamed, features added) |
| Page count change | More pages, narrower topics | Similar page count, broader coverage |
Why the difference? Infrastructure documentation inherently resists the "single path" approach because users have different environments. You can prescribe @dg.asset as the one way to define an asset, but you can't prescribe Kubernetes as the one way to run an agent — Docker shops, ECS shops, and Azure shops all need their path documented.
The result is that Dagster+ docs occupy a middle ground: they've adopted the OSS docs' new tooling (dg CLI, Components, project structure) and increased recommendations (K8s as default agent), but they preserve the 1.6-era pattern of presenting genuine alternatives where infrastructure forces it. A user reading only the Dagster+ docs would get a less prescriptive experience than someone reading only the OSS tutorials.
| Version | Codename | Date | Key Changes |
|---|---|---|---|
| 1.6 | — | Early 2024 | Baseline: SDA-first docs, dagster CLI, ops/graphs documented, MaterializeResult/AssetSpec/AssetDep stabilized |
| 1.7 | "Love Plus One" | ~April 2024 | Asset Checks GA, @multi_asset_check, freshness checks, column schema change checks, Asset Catalog UI |
| 1.8 | "Call Me Maybe" | August 2024 | Dagster Pipes GA (subprocess, K8s, Databricks, Lambda), AssetSpec directly in Definitions, SourceAsset deprecated, Auto-Materialize Policies deprecated |
| 1.9 | "Spooky" | November 2024 | Declarative Automation GA (AutomationCondition), BI integrations (Tableau, Power BI, Looker, Sigma), load_definitions_from_module, map_asset_specs |
| 1.10 | "Mambo No. 5" | February 2025 | Concurrency pools (unified run/op concurrency), Components preview, dg CLI preview, FreshnessPolicy preview |
| 1.11 | "Build Me Up Buttercup" | June 2025 | Components + dg CLI stable (1.11.10), create-dagster CLI, new project structure with defs/ folder, auto-discovery via load_from_defs_folder, docs rewrite on Docusaurus v3, partial retries, hooks in assets |
| 1.12 | "Monster Mash" | October 2025 | FreshnessPolicy GA, Components RC, partitioned asset checks, configurable backfills, Python 3.14 support, Definitions.map_asset_specs |
Latest patch: 1.12.17 (February 27, 2026).
| Deprecated API | Replacement | Deprecated In | Removal Target |
|---|---|---|---|
SourceAsset |
AssetSpec |
1.8 | 2.0 |
AutoMaterializePolicy |
AutomationCondition |
1.8 | 2.0 |
AutoMaterializeRule |
AutomationCondition composables |
1.8 | 2.0 |
@multi_asset_sensor |
AutomationCondition |
~1.9 | 2.0 |
dagster project scaffold |
create-dagster project |
1.11 | TBD |
external_assets_from_specs() |
AssetSpec directly in Definitions |
1.8 | 2.0 |
The docs themselves underwent a major rebuild between 1.6 and current:
- Migrated to Docusaurus v3 from a previous custom system
- Dual documentation system: Docusaurus for guides/tutorials, Sphinx for API reference
- Legacy docs preserved at
legacy-docs.dagster.io(covering 1.9.9 and earlier) - AI-powered "Ask AI" assistant backed by docs, GitHub issues, and discussions (answers 16,000+ community questions per month)
- Public announcement: GitHub Discussion #27332 for the new docs site going live
- Conference talk: "Write Less More: How Dagster Rebuilt Our Docs from the Ground Up" at Data Council 2025
Getting Started
├── What's Dagster?
├── Hello Dagster
├── Installation
├── Create New Project
└── Getting Help
Tutorial
├── Part 1: Intro to Assets
├── Part 2: Setup
├── Part 3: First Asset
├── Part 4: Asset Graph
├── Part 5: Scheduling
├── Part 6: I/O Managers
├── Part 7: Resources
└── Part 8: Next Steps
Concepts
├── Assets (9 sub-pages: SDAs, graph-backed, multi-assets, asset jobs,
│ observations, selection syntax, auto-materialize, checks, external)
├── Schedules & Sensors
├── Partitions & Backfills
├── Resources & Configuration
├── Code Locations
├── Dagster UI
├── Logging
├── Testing
└── Advanced ← Ops, Graphs, Jobs, I/O Management, Dagster Types, GraphQL
(NOTE: deliberately placed under "Advanced" subsection)
Guides
├── Best Practices
│ ├── Project Structure (with full recommended file tree)
│ ├── Automating Pipelines
│ ├── Building ML Pipelines
│ ├── Managing ML Models
│ └── Exploring a Dagster Project (fully-featured example)
├── How Assets Relate to Ops and Graphs
└── (Various how-to guides)
Integrations (25+ tools)
Deployment (OSS + Cloud)
API Reference (50+ libraries)
About (Community, Releases, Changelog)
Key 1.6 structural signal: Ops, Graphs, and Jobs were deliberately placed under a "Concepts → Advanced" subsection. This was a strong editorial choice — while they were fully documented, their placement signaled to new users that these were not the primary path.
Getting Started
├── Installation
├── Quickstart
├── Create New Project
└── Concepts Overview
Dagster Basics Tutorial
├── Projects
├── Assets
├── Resources
├── Asset Dependencies
├── Asset Checks
├── Automation (Schedules)
└── Components
ETL Pipeline Tutorial
├── Extract (Sling)
├── Transform (dbt)
├── Data Quality
├── Automate
└── Visualize (Evidence)
Concepts
├── Assets
├── Automation
└── (other concept pages)
Guides
├── Build
│ ├── Assets (defining, factories, external)
│ ├── Components (building, creating, registering)
│ ├── Projects (creating, structure, organizing)
│ ├── Ops & Jobs
│ └── ML Pipelines
├── Automate
│ ├── Declarative Automation
│ ├── Schedules
│ ├── Sensors
│ └── Run-status Sensors
├── Deploy
├── Observe (alerts, catalog, freshness)
├── Operate (webserver, running locally)
├── Test (asset checks, unit testing)
├── Migrate (Airflow-to-Dagster)
└── Best Practices
Integrations
Deployment
Dagster+
API Reference
├── dg CLI Reference
├── create-dagster CLI Reference
└── Python API
About (Changelog, Releases)
Notable structural change: In 1.6, "Ops, Jobs & Graphs" was a first-class Concepts section. In the current docs, it is buried under Guides → Build → Ops & Jobs — several layers deep, signaling its demotion from a core concept to an advanced/legacy pattern.
This analysis was prepared by comparing the Dagster 1.6 release documentation (available at release-1-6-0.dagster.dagster-docs.io and in the GitHub repo at the 1.6.0 tag) with the current documentation (docs.dagster.io), examining page content, structure, CLI references, abstraction layers, and editorial tone across both versions.