- Host: macOS (Apple Silicon / arm64), Darwin 24.5.0
- Docker Desktop: v29.2.1, API v1.53 (minimum API v1.44)
- Sapporo WES: sapporo-wes-2.1.0 (
ghcr.io/sapporo-wes/sapporo-service:latest) - Nextflow: 25.10.4 (via
nextflow/nextflow:25.10.4container) - Pipeline: nf-core/rnaseq (test profile)
- Date: 2026-02-22
cd ~/work/wes-test
git clone https://github.com/sapporo-wes/sapporo-service.git
cd sapporo-service
docker compose up -dVerified with:
curl -s localhost:1122/service-info | jq .Confirmed Nextflow (NFL / DSL2) listed in workflow_type_versions and workflow_engine_versions.
workflow_params.json:
{
"outdir": "results",
"max_memory": "6.GB",
"max_cpus": 2
}workflow_engine_parameters.json:
{
"-profile": "test,docker"
}Process requirement exceeds available memory -- req: 12 GB; avail: 7.7 GB
The default nf-core/rnaseq test profile requested 12 GB for FQ_LINT, exceeding Docker Desktop's memory allocation.
Fix: Added "max_memory": "6.GB" and "max_cpus": 2 to workflow_params.json.
docker: Error response from daemon: client version 1.32 is too old.
Minimum supported API version is 1.44, please upgrade your client to a newer version.
The nextflow/nextflow:25.10.4 image bundles Docker client API v1.32, but Docker Desktop requires >= v1.44.
Fix: Added -e DOCKER_API_VERSION=1.44 to the run_nextflow() function in sapporo/run.sh. Also added a Nextflow config with docker.envWhitelist = 'DOCKER_API_VERSION' to propagate the variable into child process containers.
Note: the stock run.sh already had this fix for cwltool, toil, and ep3 — but not for Nextflow. Also, local edits to run.sh required bind-mounting the file into the container via compose.yml:
volumes:
- ${PWD}/sapporo/run.sh:/app/sapporo/run.sh:rodocker: Error response from daemon: mounts denied:
The path /.nextflow/assets/nf-core/rnaseq/bin is not shared from the host
and is not known to Docker.
Nextflow stores cloned pipeline assets at /.nextflow/assets/ inside its container. When spawning child containers, it tries to bind-mount that path — but Docker Desktop on macOS cannot access paths inside another container.
Fix: Set NXF_HOME and NXF_ASSETS environment variables to point into the shared run directory (${run_dir}/nxf_home), which is host-mounted and accessible to child containers.
Pipeline ran for ~16 minutes and completed successfully.
curl -s localhost:1122/runs/25fedfa2-9792-4f62-a391-6a2da2a72628 | jq '.state'
# "COMPLETE"955 output files produced across 9 directories:
| Directory | Contents |
|---|---|
bbsplit |
Contamination screening stats |
custom |
Merged genome + GTF (with GFP spike-in) |
fastqc |
Raw read quality reports |
fq_lint |
FASTQ format validation |
multiqc |
Aggregated QC report |
pipeline_info |
Execution metadata and resource usage |
salmon |
Transcript-level quantification |
star_salmon |
STAR alignment + Salmon quantification |
trimgalore |
Adapter-trimmed reads and trim reports |
Two files were modified from the upstream defaults:
- Added
DOCKER_API_VERSION=1.44env var - Added
NXF_HOME/NXF_ASSETSenv vars pointing to the shared run directory - Added a Nextflow config file (
sapporo.config) withdocker.envWhitelist
- Bind-mounted the local
sapporo/run.shinto the container at/app/sapporo/run.sh:ro
Study: GSE128274 — "Analyses of a panel of transcripts and construction of RNA networks in hepatocellular carcinoma" BioProject: PRJNA526922 | Organism: Homo sapiens Design: 4 HCC patients, paired tumor (P) + adjacent normal (C), paired-end Illumina (NextSeq 500)
| Accession | Sample | Type |
|---|---|---|
| SRR8723780 | P1 | Tumor |
| SRR8723781 | C1 | Normal |
| SRR8723782 | P2 | Tumor |
| SRR8723783 | C2 | Normal |
| SRR8723784 | P3 | Tumor |
| SRR8723785 | C3 | Normal |
| SRR8723786 | P5 | Tumor |
| SRR8723787 | C5 | Normal |
Problem: The rnaseq container only mounted its own run_dir, so it couldn't read FASTQ files downloaded by a previous fetchngs run.
Fix: Changed run_nextflow() in sapporo/run.sh to mount ${SAPPORO_RUN_DIR} (the full runs directory) instead of ${run_dir}:
# Before:
-v "${run_dir}:${run_dir}"
# After:
-v "${SAPPORO_RUN_DIR}:${SAPPORO_RUN_DIR}"Submitted nf-core/fetchngs v1.12.0 via Sapporo:
- Run ID:
71265eb3-6d6e-47d4-9899-e29067bc2a2d - Duration: 3h 19m 56s
- Result: 16 FASTQ files (~32 GB), samplesheet auto-generated at
outputs/samplesheet/samplesheet.csv - Note: SRR8723787 required 3 FTP retries (intermittent ENA connection issues); all 8 samples completed successfully.
Parameters (fetchngs_params.json):
{
"input": "<runs>/shared/ids.csv",
"outdir": "results",
"nf_core_pipeline": "rnaseq",
"max_memory": "6.GB",
"max_cpus": 4
}Run ID: f7eeab2d-36d3-40d5-8b1c-ec615d45b44d
Duration: 13h 9m 54s | Tasks: 66/66 succeeded | CPU hours: 52.6
Parameters (rnaseq_params.json):
{
"input": "<fetchngs-run>/outputs/samplesheet/samplesheet.csv",
"outdir": "results",
"genome": "GRCh38",
"pseudo_aligner": "salmon",
"skip_alignment": true,
"max_memory": "30.GB",
"max_cpus": 4
}Two preliminary runs failed due to nf-core/rnaseq's resource requests exceeding Docker Desktop's 31.3 GB memory:
- Run 1 (
32cc2b3b): FastQC requested 36 GB (6 threads × 6 GB) — pipelinemax_memory: 24.GBdid not cap it. - Run 2 (
57276298): After adding a custom Nextflow config to cap FastQC, TrimGalore requested 72 GB.
Fix: Created runs/shared/rnaseq.config explicitly overriding all process labels:
params.max_memory = '28.GB'
params.max_cpus = 4
process {
withLabel: 'process_high' { cpus = 4; memory = '24.GB' }
withLabel: 'process_medium' { cpus = 4; memory = '24.GB' }
withLabel: 'process_low' { cpus = 2; memory = '12.GB' }
withLabel: 'process_single' { cpus = 1; memory = '6.GB' }
// ...
}Passed via engine parameters: {"-profile": "docker", "-c": "<path>/rnaseq.config"}.
361 output files (304.9 MB) across 6 directories:
| Directory | Files | Size | Contents |
|---|---|---|---|
salmon |
121 | 252.1 MB | Per-sample quant, merged count/TPM matrices, SummarizedExperiment RDS |
multiqc |
138 | 20.1 MB | Aggregated QC report |
fastqc |
64 | 29.3 MB | Per-sample FastQC reports |
trimgalore |
16 | 75.6 KB | Trim reports |
fq_lint |
16 | 22.5 KB | FASTQ validation |
pipeline_info |
6 | 3.3 MB | Execution metadata |
Key output files:
salmon/salmon.merged.gene_counts.tsv— gene-level count matrix (8 samples)salmon/salmon.merged.gene_tpm.tsv— TPM expression matrixsalmon/salmon.merged.gene.SummarizedExperiment.rds— ready for DESeq2/edgeRmultiqc/multiqc_report.html— aggregated QC across all samples
After a successful run, Sapporo generates a Workflow Run RO-Crate (ro-crate-metadata.json) containing structured metadata about the execution. A Python script (summarize_crate.py) was created to parse this file and produce a human-readable Markdown summary.
- Location:
summarize_crate.py - Dependencies: Python stdlib only (
json,sys,datetime,collections) - Usage:
python summarize_crate.py <path/to/ro-crate-metadata.json> > summary.md
- Header — Run name, ID, and completion status
- Run Overview — Workflow name/URL, language (Nextflow DSL2), engine versions (nextflow, sapporo 2.2.2), container image (nextflow/nextflow:25.10.4), start/end times, duration (15m 21s), exit code
- Input Parameters — 3 parameters:
outdir,max_memory,max_cpus - Output Summary — 955 files totalling 75.7 MB, broken down by 9 top-level directories, with a list of 14 output MIME types
- Alignment Statistics — Per-sample stats (total reads, mapped reads/rate, duplicate reads/rate) for 5 samples, derived from FileStats entities linked to BAM files
- Software Versions — All SoftwareApplication entities: nextflow, samtools (1.23), sapporo (2.2.2)
- Builds an
@id→ entity lookup dict for reference resolution - Parses
actionStatusURLs to friendly text (e.g.CompletedActionStatus→Completed) - Computes duration from ISO 8601 timestamps
- Formats file sizes human-readably (B, KB, MB, GB)
- Groups output files by first path component after
outputs/ - Links FileStats back to parent BAM File entities via reverse lookup on the
statsfield - Handles missing fields gracefully