Skip to content

Instantly share code, notes, and snippets.

@BryantAvey
Last active April 1, 2025 17:52
Show Gist options
  • Select an option

  • Save BryantAvey/88dd04dc3d446d9530dafd046cdb8ad9 to your computer and use it in GitHub Desktop.

Select an option

Save BryantAvey/88dd04dc3d446d9530dafd046cdb8ad9 to your computer and use it in GitHub Desktop.
Feature Comparison: Lakehouse vs. Warehouse (Post-March 2025)
Capability Microsoft Fabric Lakehouse Microsoft Fabric Warehouse
Supported Data Types Structured tables, plus semi-structured & unstructured data (files). Lakehouse can store data as Delta tables (Parquet under the hood) and also raw files in OneLake​ learn.microsoft.com. Ideal for both BI schemas and data lake use cases. Structured data primarily. Optimized for relational tables (supports some semi-structured JSON within tables)​ learn.microsoft.com. Not designed for file storage or unstructured data.
Data Storage & Format All Lakehouse tables are stored in OneLake as Delta Lake format (Parquet + transaction log)​ red-gate.com. This means data is in an open format accessible via Spark or SQL. Also supports folders for files. Warehouse tables are also stored as Delta/Parquet under the covers in OneLake​ red-gate.com, but exposed only via the SQL engine (no direct file/folder interface). No raw file storage in a Warehouse item.
Organization (Schema) Tables are organized into databases & now support schemas within the Lakehouse (preview)​ learn.microsoft.com. Also supports folder hierarchies for files. Tables organized into databases and schemas (fully supported) – standard SQL structure​ learn.microsoft.com.
Primary Interfaces Spark (notebooks in Python/Scala/Spark SQL, for data engineering, ML) and SQL (T-SQL queries via the Lakehouse’s SQL Analytics Endpoint)​ learn.microsoft.com. You can use both interchangeably (OneCopy allows querying Lakehouse tables from a Warehouse and vice-versa)​ red-gate.com. SQL only. The Warehouse is accessed via T-SQL queries (in Fabric’s SQL editor or external tools). It does not natively integrate with Spark notebooks (though a Spark connector for Warehouse exists in preview to read/write Warehouse tables).
Data Ingestion & Processing Flexible: can ingest via Spark (ETL jobs, streaming), pipelines, or use SQL (INSERT/COPY statements via SQL endpoint). Because Lakehouse can be accessed by both engines, it supports both code-first data engineering and SQL-based ELT. SQL-centric: data loaded via T-SQL commands (INSERT, COPY INTO, CTAS, etc.) or pipelines that target the Warehouse. Geared toward structured ETL/ELT. Lacks a built-in Spark engine.
Transactions ACID compliant on single tables (Delta Lake ensures ACID for writes to each table). However, no multi-table transaction support – you cannot wrap writes to multiple tables in one atomic transaction via Spark. Full multi-table transactions supported. Warehouse behaves like a traditional database, allowing BEGIN/COMMIT for changes across multiple tables as one unit​ learn.microsoft.com.
Performance & Scale Designed to scale to big data volumes. Supports distributed Spark processing for heavy transformations, and SQL queries are executed via a distributed engine as well. Lakehouse’s Direct Lake integration with Power BI allows extremely fast queries against large data without refresh overhead​ powerbi.microsoft.com. Optimized for high-concurrency SQL workloads. Because it’s a managed SQL experience, it may be more straightforward to tune indexes or use query hints.
Security & Governance Unified OneLake security model (preview) enabling role-based access at folder, table, row, column level – enforced across Spark, SQL, and Power BI​ microsoft.com. Also inherits workspace roles and item permissions. RLS/CLS supported via SQL. Integration with Purview for data governance is in preview. Traditional DB security model with workspace roles and SQL GRANT/DENY permissions (table-level, RLS, CLS, etc.)​ learn.microsoft.com. Lacks a concept of file/folder security.
BI Model Integration Lakehouse tables can be used in Power BI Direct Lake mode for star-schema models. As of March 2025, Power BI Desktop can create a Direct Lake dataset using multiple Lakehouses and Warehouses as sources​ powerbi.microsoft.com. Warehouse tables are also usable in Power BI (either via Direct Lake or DirectQuery). They have always been suited for star schemas.
Advanced Analytics Excellent support for advanced analytics: Spark MLlib, integration with Pandas/Python, etc. Lakehouse can serve as a data science platform (feature engineering in-place on the lake data). It also supports leveraging Azure ML and other AI integrations. Limited to what can be done in SQL (analytical functions, some ML via built-in T-SQL if any, or by exporting data out). Not intended for running Spark ML or Python processing within the Warehouse itself.
Use Case Fit Unified platform – Suitable for both data lake scenarios (staging raw data, streaming data, files, big data processing) and classical data warehouse scenarios (dimension/fact models for BI). Traditional warehouse – Best for structured BI where data is already relational and the team wants a managed SQL environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment