pg_lake turns PostgreSQL into a lakehouse by integrating Apache Iceberg and data lake file formats (Parquet, CSV, JSON) directly into PostgreSQL. Users connect only to PostgreSQL — the extensions transparently delegate data scanning and computation to pgduck_server (a standalone process backed by DuckDB) while PostgreSQL handles query planning, transaction boundaries, and Iceberg metadata management.
graph LR
Client["Client (psql, app)"]
PG["PostgreSQL<br/>+ pg_lake extensions"]
PGDuck["pgduck_server<br/>(DuckDB engine)"]
Storage["Object Storage<br/>(S3 / GCS)"]
Client -->|"SQL (port 5432)"| PG
PG -->|"libpq (unix socket :5332)"| PGDuck
PG -->|"Iceberg metadata<br/>(Avro read/write)"| Storage
PGDuck -->|"Parquet/CSV/JSON<br/>read & write"| Storage
All extensions install into pg_catalog. The requires field in each .control file defines the dependency chain. Installing pg_lake with CASCADE pulls in the entire tree.
graph TD
pg_lake["<b>pg_lake</b><br/><i>Meta-extension (installs everything)</i>"]
pg_lake_copy["<b>pg_lake_copy</b><br/><i>COPY to/from data lake files</i>"]
pg_lake_table["<b>pg_lake_table</b><br/><i>FDW for lake & Iceberg tables</i>"]
pg_lake_iceberg["<b>pg_lake_iceberg</b><br/><i>Iceberg v2 protocol impl</i>"]
pg_lake_engine["<b>pg_lake_engine</b><br/><i>Shared query engine module</i>"]
pg_extension_base["<b>pg_extension_base</b><br/><i>Extension dev kit / foundation</i>"]
pg_map["<b>pg_map</b><br/><i>Generic map type for PG</i>"]
pg_extension_updater["<b>pg_extension_updater</b><br/><i>Auto extension updates</i>"]
btree_gist["<b>btree_gist</b><br/><i>PostgreSQL contrib</i>"]
pg_lake_spatial["<b>pg_lake_spatial</b><br/><i>Geospatial file format support</i>"]
postgis["<b>PostGIS</b><br/><i>(external)</i>"]
pg_lake_benchmark["<b>pg_lake_benchmark</b><br/><i>Benchmarking</i>"]
pg_lake --> pg_lake_table
pg_lake --> pg_lake_copy
pg_lake_copy --> pg_lake_engine
pg_lake_copy --> pg_lake_iceberg
pg_lake_copy --> pg_lake_table
pg_lake_table --> pg_lake_engine
pg_lake_table --> pg_lake_iceberg
pg_lake_table --> btree_gist
pg_lake_iceberg --> pg_lake_engine
pg_lake_engine --> pg_extension_base
pg_lake_engine --> pg_map
pg_extension_updater --> pg_extension_base
pg_lake_spatial --> pg_lake
pg_lake_spatial --> postgis
pg_lake_benchmark --> pg_lake
style pg_lake fill:#4a90d9,color:#fff
style pg_lake_table fill:#5ba85b,color:#fff
style pg_lake_iceberg fill:#d9a54a,color:#fff
style pg_lake_engine fill:#c75050,color:#fff
style pg_lake_copy fill:#7b68ae,color:#fff
style pg_extension_base fill:#888,color:#fff
style pg_map fill:#888,color:#fff
style pg_extension_updater fill:#888,color:#fff
style btree_gist fill:#aaa,color:#fff
style pg_lake_spatial fill:#4a90d9,color:#fff,stroke-dasharray: 5
style pg_lake_benchmark fill:#4a90d9,color:#fff,stroke-dasharray: 5
style postgis fill:#aaa,color:#fff,stroke-dasharray: 5
| Extension | Role |
|---|---|
| pg_extension_base | Foundation for all extensions. Provides background workers, library preloading, extension control file parsing, PG version compatibility shims. |
| pg_map | Generic key-value map type generator for semi-structured data in PostgreSQL. |
| pg_extension_updater | Automatically updates installed extensions on startup. |
| pg_lake_engine | Shared module across all pg_lake extensions. Houses the pgduck client (pgduck/client.c), query rewriting, type mapping between PG and DuckDB, read/write data generation, CSV/Parquet/JSON handling, S3/GCS utilities, and the utility hook infrastructure. |
| pg_lake_iceberg | Full Iceberg v2 specification implementation. Manages table metadata, snapshots, manifests, manifest entries, data files, schemas, and partitioning — all serialized via Avro. Supports object store catalog and REST catalog. |
| pg_lake_table | Foreign Data Wrapper (FDW) that exposes data lake files and Iceberg tables as PostgreSQL foreign tables. Implements the full FDW API (scan, insert, update, delete), DDL handling (CREATE TABLE ... USING iceberg), full query pushdown via a custom planner hook, snapshot management, file pruning, and transaction hooks for Iceberg metadata commits. Also provides a placeholder table access method (pg_lake_iceberg) for CREATE TABLE ... USING syntax. |
| pg_lake_copy | Extends PostgreSQL's COPY command to import/export data to data lake file formats. |
| pg_lake | Meta-extension. Installing it with CASCADE installs all required extensions. |
| pg_lake_spatial | Optional. Adds geospatial file format support (GeoParquet). Requires PostGIS. |
graph TD
subgraph repo["pg_lake repository"]
direction TB
subgraph core_ext["Core Extensions (C, PGXS)"]
PEB["pg_extension_base/"]
PEU["pg_extension_updater/"]
PM["pg_map/"]
PLE["pg_lake_engine/"]
PLI["pg_lake_iceberg/"]
PLT["pg_lake_table/"]
PLC["pg_lake_copy/"]
PL["pg_lake/"]
end
subgraph optional_ext["Optional Extensions"]
PLS["pg_lake_spatial/"]
PLB["pg_lake_benchmark/"]
end
subgraph external["External Components"]
PDS["pgduck_server/<br/>(standalone C server)"]
DPL["duckdb_pglake/<br/>(DuckDB C++ extension)"]
AVRO["avro/<br/>(patched Apache Avro)"]
end
subgraph support["Supporting"]
TC["test_common/<br/>(shared pytest fixtures)"]
TOOLS["tools/<br/>(version bumping, etc.)"]
DOCS["docs/"]
DOCKER["docker/"]
DATA["data/<br/>(test data)"]
end
end
style core_ext fill:#e8f0fe,stroke:#4a90d9
style optional_ext fill:#fef7e0,stroke:#d9a54a
style external fill:#fde8e8,stroke:#c75050
style support fill:#e8fee8,stroke:#5ba85b
| Directory | Purpose |
|---|---|
pgduck/ |
Client connection to pgduck_server (client.c), query rewriting (rewrite_query.c), read/write query generation (read_data.c, write_data.c), DuckDB type mapping (type.c), data serialization, and shippable function/operator lists. |
parquet/ |
Parquet schema field mapping and GeoParquet support. |
csv/, json/ |
Format-specific options and readers. |
copy/ |
Remote copy query generation. |
cleanup/ |
Deletion queue and in-progress file tracking. |
storage/ |
Local storage utilities. |
ddl/ |
Utility hook for DDL interception. |
extensions/ |
Per-extension version and dependency metadata. |
| Directory | Purpose |
|---|---|
fdw/ |
FDW callback implementations (pg_lake_table.c), snapshot management (snapshot.c), query deparsing, data file pruning, partitioning, writable table support, row ID management, and position delete handling. |
planner/ |
Full query pushdown via planner hook (query_pushdown.c), insert-select optimization, and EXPLAIN support. |
ddl/ |
CREATE TABLE (both CREATE FOREIGN TABLE and CREATE TABLE ... USING iceberg), ALTER TABLE, DROP TABLE, VACUUM. |
transaction/ |
Iceberg metadata change tracking within PG transactions (track_iceberg_metadata_changes.c) and transaction commit/abort hooks (transaction_hooks.c). |
access_method/ |
Placeholder table access method for USING iceberg syntax. |
duckdb/ |
Query transformation from PG parse tree to DuckDB-compatible SQL. |
| Directory | Purpose |
|---|---|
iceberg/api/ |
Core Iceberg data structures: table_metadata.c, snapshot.c, manifest.c, manifest_list.c, manifest_entry.c, datafile.c, table_schema.c. |
iceberg/ |
Catalog management (catalog.c), metadata read/write, partitioning, manifest merge, and Iceberg type serialization (binary and JSON). |
avro/ |
Avro reader/writer for serializing Iceberg metadata files. |
rest_catalog/ |
REST catalog API client. |
object_store_catalog/ |
Object store (filesystem-style) catalog. |
http/ |
HTTP client for REST catalog communication. |
The following diagram traces the complete lifecycle of creating an Iceberg table, inserting rows, and querying them.
sequenceDiagram
participant Client
participant PG as PostgreSQL
participant PLT as pg_lake_table<br/>(DDL hooks)
participant PLI as pg_lake_iceberg<br/>(metadata)
participant S3 as Object Storage
Client->>PG: CREATE TABLE t (id int, name text)<br/>USING iceberg
PG->>PLT: Utility hook intercepts CREATE stmt
Note over PLT: ProcessCreateIcebergTableFromCreateStmt()<br/>Converts to CREATE FOREIGN TABLE<br/>with SERVER pg_lake_iceberg
PLT->>PLT: Validate column types, set options<br/>(location from default_location_prefix)
PLT->>PG: Execute CREATE FOREIGN TABLE internally
PG-->>PLT: Foreign table created in pg_catalog
PLT->>PLI: InsertInternalIcebergCatalogTable()<br/>Register in lake_iceberg.tables
PLI->>PLI: Initialize Iceberg table metadata<br/>(schema, snapshot, partition spec)
PLI->>S3: Write initial metadata.json<br/>(via Avro serialization)
PLT->>PLT: Register field ID mappings<br/>Initialize data file catalogs
PG-->>Client: CREATE TABLE OK
sequenceDiagram
participant Client
participant PG as PostgreSQL
participant FDW as pg_lake_table<br/>(FDW callbacks)
participant Engine as pg_lake_engine<br/>(pgduck client)
participant PGDuck as pgduck_server<br/>(DuckDB)
participant S3 as Object Storage
participant TxTrack as Transaction<br/>Tracker
Client->>PG: INSERT INTO t VALUES (1,'alice'),(2,'bob')
PG->>FDW: BeginForeignModify()
FDW->>FDW: Create DestReceiver for inserts<br/>(partitioned if needed)
PG->>FDW: ExecForeignInsert() [per row]
FDW->>FDW: Buffer rows in CSV via COPY TO
PG->>FDW: EndForeignModify()
Note over FDW: Flush buffered CSV to local temp file
FDW->>Engine: ConvertCSVFileTo()<br/>Generate DuckDB COPY command
Engine->>PGDuck: COPY (SELECT ... FROM read_csv(...))<br/>TO 's3://bucket/.../data-xxx.parquet'<br/>(via libpq over unix socket)
PGDuck->>S3: Write Parquet file
PGDuck-->>Engine: Write complete + stats
FDW->>FDW: Record new data file in<br/>pg_lake catalogs (data_files, stats)
FDW->>TxTrack: TrackIcebergMetadataChangesInTx()<br/>Record pending ADD_FILE operation
Note over TxTrack: Changes held until COMMIT
Client->>PG: COMMIT
PG->>TxTrack: IcebergXactCallback(PRE_COMMIT)
TxTrack->>TxTrack: ConsumeTrackedIcebergMetadataChanges()<br/>Diff current vs. last-pushed metadata
TxTrack->>TxTrack: Build new Iceberg snapshot:<br/>• New manifest entry for data file<br/>• New manifest (Avro)<br/>• New manifest list (Avro)<br/>• New snapshot in table metadata
TxTrack->>S3: Write manifest Avro files
TxTrack->>S3: Write updated metadata.json
PG->>TxTrack: IcebergXactCallback(COMMIT)
TxTrack->>TxTrack: PostAllRestCatalogRequests()<br/>(if using REST catalog)
PG-->>Client: COMMIT OK
sequenceDiagram
participant Client
participant PG as PostgreSQL
participant Planner as pg_lake_table<br/>(planner hook)
participant FDW as pg_lake_table<br/>(FDW scan)
participant Engine as pg_lake_engine
participant PGDuck as pgduck_server<br/>(DuckDB)
participant S3 as Object Storage
Client->>PG: SELECT * FROM t WHERE id > 1
alt Full Query Pushdown (entire query is shippable)
PG->>Planner: LakeTablePlanner()
Planner->>Planner: Check all expressions are shippable<br/>(functions, operators, types)
Planner->>Planner: Generate CustomScan with<br/>rewritten DuckDB SQL
Planner->>Engine: Deparse query to DuckDB dialect
Note over Planner: Replaces table refs with<br/>read_parquet([file1, file2, ...])<br/>using current snapshot's file list
PG->>Engine: QueryPushdownExecScan()
Engine->>PGDuck: Send full SQL query<br/>(via libpq, extended protocol)
PGDuck->>S3: Read Parquet files
PGDuck-->>Engine: Stream result rows
Engine-->>PG: Convert DuckDB results to PG tuples
else FDW Path (partial pushdown)
PG->>FDW: GetForeignRelSize / GetForeignPaths
FDW->>FDW: CreatePgLakeScanSnapshot()<br/>Get file list from Iceberg metadata
FDW->>FDW: Data file pruning via min/max stats
PG->>FDW: GetForeignPlan — deparse SELECT
Note over FDW: Generates: SELECT cols FROM<br/>read_parquet([files]) WHERE ...
PG->>FDW: BeginForeignScan()
FDW->>Engine: SendQueryToPGDuck()
Engine->>PGDuck: Execute deparsed query
PGDuck->>S3: Read Parquet files
PGDuck-->>Engine: Stream rows via PG wire protocol
Engine-->>FDW: WaitForResult() / fetch tuples
FDW-->>PG: Return HeapTuples
end
PG-->>Client: Query result rows
pgduck_server is a standalone multi-threaded process that implements the PostgreSQL wire protocol and delegates all computation to DuckDB. It is not a PostgreSQL extension — it is a separate binary.
graph TD
subgraph pgduck_server["pgduck_server process"]
Main["main.c<br/>Parse CLI options"]
DuckInit["duckdb_global_init()<br/>Initialize DuckDB instance"]
PGServer["pgserver.c<br/>Unix socket listener (:5332)"]
ThreadPool["client_threadpool.c<br/>Thread pool"]
subgraph per_thread["Per-connection thread"]
PGSession["pgsession.c<br/>PG wire protocol handler"]
PGSessionIO["pgsession_io.c<br/>Low-level socket I/O"]
DuckDB["duckdb.c<br/>Execute queries on DuckDB"]
TypeConv["type_conversion.c<br/>DuckDB → PG type mapping"]
end
DuckDBExt["duckdb_pglake extension<br/>(loaded into DuckDB)"]
end
subgraph duckdb_pglake["duckdb_pglake DuckDB extension"]
S3FS["pg_lake_s3fs<br/>Region-aware S3 filesystem"]
CacheFS["caching_file_system<br/>Local file caching"]
UtilFns["utility_functions<br/>PG-compatible functions"]
end
Main --> DuckInit
Main --> PGServer
PGServer --> ThreadPool
ThreadPool --> per_thread
PGSession --> DuckDB
DuckDB --> DuckDBExt
DuckDBExt --> duckdb_pglake
style pgduck_server fill:#fde8e8,stroke:#c75050
style per_thread fill:#fff0f0,stroke:#c75050
style duckdb_pglake fill:#e8f0fe,stroke:#4a90d9
PostgreSQL extensions communicate with pgduck_server using libpq (the standard PostgreSQL client library) over a Unix domain socket (default port 5332). This means pgduck_server looks like another PostgreSQL server to the extensions.
- Queries are sent via
PQsendQueryParams()(extended query protocol) for streaming results, orPQsendQuery()for simple commands. - For bulk data transfer, a
TRANSMITprefix triggers COPY-OUT mode — pgduck_server streams results in CSV batches over the PG wire protocol, avoiding per-row overhead. - Connections are pooled per-backend in a hash table and released at transaction boundaries via
XactCallback.
Iceberg metadata changes are tracked in-memory during a transaction and only committed atomically at PRE_COMMIT time:
stateDiagram-v2
[*] --> Active: BEGIN (implicit or explicit)
Active --> Active: INSERT / UPDATE / DELETE<br/>Data files written to S3<br/>Changes tracked in hash table
Active --> PreCommit: COMMIT issued
state PreCommit {
[*] --> DiffMetadata: ConsumeTrackedIcebergMetadataChanges()
DiffMetadata --> BuildSnapshot: Diff added/removed files<br/>vs. last-pushed metadata
BuildSnapshot --> WriteManifests: Create new manifests (Avro)
WriteManifests --> WriteMetadata: Write new metadata.json to S3
WriteMetadata --> [*]
}
PreCommit --> Committed: XACT_EVENT_COMMIT
Committed --> PostCommit: PostAllRestCatalogRequests()<br/>(REST catalog only)
PostCommit --> [*]
Active --> Aborted: ABORT / ERROR
Aborted --> Cleanup: ResetTrackedIcebergMetadataOperation()
Cleanup --> [*]
pg_lake_table installs a PostgreSQL planner hook that attempts to push down entire queries to DuckDB. If any part of the query is not shippable, it falls back to the standard FDW path where only per-table scans are pushed down.
flowchart TD
Q["Incoming Query"]
Q --> Check{"All tables are<br/>pg_lake tables?"}
Check -->|No| Standard["Standard PostgreSQL<br/>planner (no pushdown)"]
Check -->|Yes| Shippable{"All expressions<br/>shippable to DuckDB?"}
Shippable -->|Yes| FullPush["Full Query Pushdown<br/>(CustomScan node)"]
Shippable -->|No| FDW["FDW Path<br/>(per-table scan pushdown)"]
FullPush --> Rewrite["Rewrite entire query<br/>to DuckDB dialect"]
Rewrite --> Replace["Replace table refs with<br/>read_parquet(file_list)"]
Replace --> Send["Send to pgduck_server"]
FDW --> Deparse["Deparse per-table SELECT<br/>with pushed-down WHERE"]
Deparse --> Replace2["Replace table ref with<br/>read_parquet(file_list)"]
Replace2 --> Send2["Send to pgduck_server"]
style FullPush fill:#5ba85b,color:#fff
style FDW fill:#d9a54a,color:#fff
style Standard fill:#888,color:#fff
When data is written to an Iceberg table, it flows through several stages:
flowchart LR
subgraph PostgreSQL
PG_COPY["COPY TO<br/>(PG internal)"]
CSV["Local CSV<br/>temp file"]
end
subgraph pg_lake_engine
GenQ["Generate DuckDB<br/>COPY command"]
end
subgraph pgduck_server
ReadCSV["read_csv()"]
WriteParquet["COPY TO<br/>Parquet"]
end
subgraph Object Storage
Parquet["data-xxx.parquet"]
end
PG_COPY -->|"Rows buffered<br/>as CSV"| CSV
CSV -->|"File path"| GenQ
GenQ -->|"COPY (SELECT ... FROM<br/>read_csv('/tmp/xx.csv'))<br/>TO 's3://...parquet'"| ReadCSV
ReadCSV --> WriteParquet
WriteParquet --> Parquet
style PostgreSQL fill:#e8f0fe,stroke:#4a90d9
style pg_lake_engine fill:#fde8e8,stroke:#c75050
style pgduck_server fill:#fef7e0,stroke:#d9a54a
Rows are first serialized to CSV using PostgreSQL's built-in COPY mechanism, then pgduck_server reads that CSV and writes it out as Parquet to object storage. This avoids implementing a Parquet writer inside PostgreSQL and leverages DuckDB's efficient columnar engine for the conversion.