Skip to content

Instantly share code, notes, and snippets.

@inutano
Last active February 25, 2026 09:36
Show Gist options
  • Select an option

  • Save inutano/48b471141c546faffd77a27066a10a28 to your computer and use it in GitHub Desktop.

Select an option

Save inutano/48b471141c546faffd77a27066a10a28 to your computer and use it in GitHub Desktop.
WES Test Report: nf-core/rnaseq via Sapporo on macOS (Apple Silicon)

WES Test Report: nf-core/rnaseq via Sapporo

Environment

  • Host: macOS (Apple Silicon / arm64), Darwin 24.5.0
  • Docker Desktop: v29.2.1, API v1.53 (minimum API v1.44)
  • Sapporo WES: sapporo-wes-2.1.0 (ghcr.io/sapporo-wes/sapporo-service:latest)
  • Nextflow: 25.10.4 (via nextflow/nextflow:25.10.4 container)
  • Pipeline: nf-core/rnaseq (test profile)
  • Date: 2026-02-22

Step 1: Clone and start Sapporo

cd ~/work/wes-test
git clone https://github.com/sapporo-wes/sapporo-service.git
cd sapporo-service
docker compose up -d

Verified with:

curl -s localhost:1122/service-info | jq .

Confirmed Nextflow (NFL / DSL2) listed in workflow_type_versions and workflow_engine_versions.

Step 2: Prepare workflow parameters

workflow_params.json:

{
  "outdir": "results",
  "max_memory": "6.GB",
  "max_cpus": 2
}

workflow_engine_parameters.json:

{
  "-profile": "test,docker"
}

Step 3: Submit and iterate

Run 1 — OOM

Process requirement exceeds available memory -- req: 12 GB; avail: 7.7 GB

The default nf-core/rnaseq test profile requested 12 GB for FQ_LINT, exceeding Docker Desktop's memory allocation.

Fix: Added "max_memory": "6.GB" and "max_cpus": 2 to workflow_params.json.

Run 2 — Docker API version mismatch

docker: Error response from daemon: client version 1.32 is too old.
Minimum supported API version is 1.44, please upgrade your client to a newer version.

The nextflow/nextflow:25.10.4 image bundles Docker client API v1.32, but Docker Desktop requires >= v1.44.

Fix: Added -e DOCKER_API_VERSION=1.44 to the run_nextflow() function in sapporo/run.sh. Also added a Nextflow config with docker.envWhitelist = 'DOCKER_API_VERSION' to propagate the variable into child process containers.

Note: the stock run.sh already had this fix for cwltool, toil, and ep3 — but not for Nextflow. Also, local edits to run.sh required bind-mounting the file into the container via compose.yml:

volumes:
  - ${PWD}/sapporo/run.sh:/app/sapporo/run.sh:ro

Run 3 — Mount denied for pipeline assets

docker: Error response from daemon: mounts denied:
The path /.nextflow/assets/nf-core/rnaseq/bin is not shared from the host
and is not known to Docker.

Nextflow stores cloned pipeline assets at /.nextflow/assets/ inside its container. When spawning child containers, it tries to bind-mount that path — but Docker Desktop on macOS cannot access paths inside another container.

Fix: Set NXF_HOME and NXF_ASSETS environment variables to point into the shared run directory (${run_dir}/nxf_home), which is host-mounted and accessible to child containers.

Run 4 — Success

Pipeline ran for ~16 minutes and completed successfully.

Step 4: Verify results

curl -s localhost:1122/runs/25fedfa2-9792-4f62-a391-6a2da2a72628 | jq '.state'
# "COMPLETE"

955 output files produced across 9 directories:

Directory Contents
bbsplit Contamination screening stats
custom Merged genome + GTF (with GFP spike-in)
fastqc Raw read quality reports
fq_lint FASTQ format validation
multiqc Aggregated QC report
pipeline_info Execution metadata and resource usage
salmon Transcript-level quantification
star_salmon STAR alignment + Salmon quantification
trimgalore Adapter-trimmed reads and trim reports

Summary of changes to Sapporo

Two files were modified from the upstream defaults:

sapporo/run.shrun_nextflow() function

  • Added DOCKER_API_VERSION=1.44 env var
  • Added NXF_HOME / NXF_ASSETS env vars pointing to the shared run directory
  • Added a Nextflow config file (sapporo.config) with docker.envWhitelist

compose.yml

  • Bind-mounted the local sapporo/run.sh into the container at /app/sapporo/run.sh:ro

Part 2: Human HCC RNA-seq Analysis (GSE128274)

Dataset

Study: GSE128274 — "Analyses of a panel of transcripts and construction of RNA networks in hepatocellular carcinoma" BioProject: PRJNA526922 | Organism: Homo sapiens Design: 4 HCC patients, paired tumor (P) + adjacent normal (C), paired-end Illumina (NextSeq 500)

Accession Sample Type
SRR8723780 P1 Tumor
SRR8723781 C1 Normal
SRR8723782 P2 Tumor
SRR8723783 C2 Normal
SRR8723784 P3 Tumor
SRR8723785 C3 Normal
SRR8723786 P5 Tumor
SRR8723787 C5 Normal

Step 1: Fix Sapporo for cross-run data access

Problem: The rnaseq container only mounted its own run_dir, so it couldn't read FASTQ files downloaded by a previous fetchngs run.

Fix: Changed run_nextflow() in sapporo/run.sh to mount ${SAPPORO_RUN_DIR} (the full runs directory) instead of ${run_dir}:

# Before:
-v "${run_dir}:${run_dir}"
# After:
-v "${SAPPORO_RUN_DIR}:${SAPPORO_RUN_DIR}"

Step 2: Download data with nf-core/fetchngs

Submitted nf-core/fetchngs v1.12.0 via Sapporo:

  • Run ID: 71265eb3-6d6e-47d4-9899-e29067bc2a2d
  • Duration: 3h 19m 56s
  • Result: 16 FASTQ files (~32 GB), samplesheet auto-generated at outputs/samplesheet/samplesheet.csv
  • Note: SRR8723787 required 3 FTP retries (intermittent ENA connection issues); all 8 samples completed successfully.

Parameters (fetchngs_params.json):

{
  "input": "<runs>/shared/ids.csv",
  "outdir": "results",
  "nf_core_pipeline": "rnaseq",
  "max_memory": "6.GB",
  "max_cpus": 4
}

Step 3: Run nf-core/rnaseq with Salmon pseudo-alignment

Run ID: f7eeab2d-36d3-40d5-8b1c-ec615d45b44d Duration: 13h 9m 54s | Tasks: 66/66 succeeded | CPU hours: 52.6

Parameters (rnaseq_params.json):

{
  "input": "<fetchngs-run>/outputs/samplesheet/samplesheet.csv",
  "outdir": "results",
  "genome": "GRCh38",
  "pseudo_aligner": "salmon",
  "skip_alignment": true,
  "max_memory": "30.GB",
  "max_cpus": 4
}

Iteration required: memory limits

Two preliminary runs failed due to nf-core/rnaseq's resource requests exceeding Docker Desktop's 31.3 GB memory:

  • Run 1 (32cc2b3b): FastQC requested 36 GB (6 threads × 6 GB) — pipeline max_memory: 24.GB did not cap it.
  • Run 2 (57276298): After adding a custom Nextflow config to cap FastQC, TrimGalore requested 72 GB.

Fix: Created runs/shared/rnaseq.config explicitly overriding all process labels:

params.max_memory = '28.GB'
params.max_cpus = 4

process {
    withLabel: 'process_high'   { cpus = 4; memory = '24.GB' }
    withLabel: 'process_medium' { cpus = 4; memory = '24.GB' }
    withLabel: 'process_low'    { cpus = 2; memory = '12.GB' }
    withLabel: 'process_single' { cpus = 1; memory = '6.GB'  }
    // ...
}

Passed via engine parameters: {"-profile": "docker", "-c": "<path>/rnaseq.config"}.

Step 4: Results

361 output files (304.9 MB) across 6 directories:

Directory Files Size Contents
salmon 121 252.1 MB Per-sample quant, merged count/TPM matrices, SummarizedExperiment RDS
multiqc 138 20.1 MB Aggregated QC report
fastqc 64 29.3 MB Per-sample FastQC reports
trimgalore 16 75.6 KB Trim reports
fq_lint 16 22.5 KB FASTQ validation
pipeline_info 6 3.3 MB Execution metadata

Key output files:

  • salmon/salmon.merged.gene_counts.tsv — gene-level count matrix (8 samples)
  • salmon/salmon.merged.gene_tpm.tsv — TPM expression matrix
  • salmon/salmon.merged.gene.SummarizedExperiment.rds — ready for DESeq2/edgeR
  • multiqc/multiqc_report.html — aggregated QC across all samples

Step 5: Generate run summary from RO-Crate

After a successful run, Sapporo generates a Workflow Run RO-Crate (ro-crate-metadata.json) containing structured metadata about the execution. A Python script (summarize_crate.py) was created to parse this file and produce a human-readable Markdown summary.

Script overview

  • Location: summarize_crate.py
  • Dependencies: Python stdlib only (json, sys, datetime, collections)
  • Usage: python summarize_crate.py <path/to/ro-crate-metadata.json> > summary.md

Generated sections

  1. Header — Run name, ID, and completion status
  2. Run Overview — Workflow name/URL, language (Nextflow DSL2), engine versions (nextflow, sapporo 2.2.2), container image (nextflow/nextflow:25.10.4), start/end times, duration (15m 21s), exit code
  3. Input Parameters — 3 parameters: outdir, max_memory, max_cpus
  4. Output Summary — 955 files totalling 75.7 MB, broken down by 9 top-level directories, with a list of 14 output MIME types
  5. Alignment Statistics — Per-sample stats (total reads, mapped reads/rate, duplicate reads/rate) for 5 samples, derived from FileStats entities linked to BAM files
  6. Software Versions — All SoftwareApplication entities: nextflow, samtools (1.23), sapporo (2.2.2)

Implementation notes

  • Builds an @id → entity lookup dict for reference resolution
  • Parses actionStatus URLs to friendly text (e.g. CompletedActionStatusCompleted)
  • Computes duration from ISO 8601 timestamps
  • Formats file sizes human-readably (B, KB, MB, GB)
  • Groups output files by first path component after outputs/
  • Links FileStats back to parent BAM File entities via reverse lookup on the stats field
  • Handles missing fields gracefully

Sapporo workflow run f7eeab2d-36d3-40d5-8b1c-ec615d45b44d

Run ID: f7eeab2d-36d3-40d5-8b1c-ec615d45b44d Status: Completed

Run Overview

Item Value
Workflow rnaseq
Workflow language Nextflow DSL2
Engine nextflow, sapporo 2.2.2
Container image nextflow/nextflow:25.10.4
Start time 2026-02-24 12:25:18 UTC
End time 2026-02-25 01:35:37 UTC
Duration 13h 10m 19s
Exit code 0

Input Parameters

Name Value
input /Users/inutano/work/wes-test/sapporo-service/runs/71/71265eb3-6d6e-47d4-9899-e29067bc2a2d/outputs/samplesheet/samplesheet.csv
outdir results
genome GRCh38
pseudo_aligner salmon
skip_alignment True
max_memory 30.GB
max_cpus 4

Output Summary

Total files: 361 Total size: 304.9 MB

Breakdown by Directory

Directory Files Size
fastqc 64 29.3 MB
fq_lint 16 22.5 KB
multiqc 138 20.1 MB
pipeline_info 6 3.3 MB
salmon 121 252.1 MB
trimgalore 16 75.6 KB

Output Formats

  • application/gzip
  • application/json
  • application/octet-stream
  • application/pdf
  • application/zip
  • image/png
  • image/svg+xml
  • text/html
  • text/plain

Software Versions

Name Version URL
nextflow N/A https://www.nextflow.io
sapporo 2.2.2 https://github.com/sapporo-wes/sapporo-service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment