inutano/REPORT.md

## REPORT.md

      
    Raw
  

              REPORT.md
            
          
    WES Test Report: nf-core/rnaseq via Sapporo

Environment


Host: macOS (Apple Silicon / arm64), Darwin 24.5.0
Docker Desktop: v29.2.1, API v1.53 (minimum API v1.44)
Sapporo WES: sapporo-wes-2.1.0 (ghcr.io/sapporo-wes/sapporo-service:latest)
Nextflow: 25.10.4 (via nextflow/nextflow:25.10.4 container)
Pipeline: nf-core/rnaseq (test profile)
Date: 2026-02-22

Step 1: Clone and start Sapporo

cd ~/work/wes-test
git clone https://github.com/sapporo-wes/sapporo-service.git
cd sapporo-service
docker compose up -d
Verified with:
curl -s localhost:1122/service-info | jq .
Confirmed Nextflow (NFL / DSL2) listed in workflow_type_versions and workflow_engine_versions.
Step 2: Prepare workflow parameters

workflow_params.json:
{
  "outdir": "results",
  "max_memory": "6.GB",
  "max_cpus": 2
}
workflow_engine_parameters.json:
{
  "-profile": "test,docker"
}
Step 3: Submit and iterate

Run 1 — OOM

Process requirement exceeds available memory -- req: 12 GB; avail: 7.7 GB

The default nf-core/rnaseq test profile requested 12 GB for FQ_LINT, exceeding Docker Desktop's memory allocation.
Fix: Added "max_memory": "6.GB" and "max_cpus": 2 to workflow_params.json.
Run 2 — Docker API version mismatch

docker: Error response from daemon: client version 1.32 is too old.
Minimum supported API version is 1.44, please upgrade your client to a newer version.

The nextflow/nextflow:25.10.4 image bundles Docker client API v1.32, but Docker Desktop requires >= v1.44.
Fix: Added -e DOCKER_API_VERSION=1.44 to the run_nextflow() function in sapporo/run.sh. Also added a Nextflow config with docker.envWhitelist = 'DOCKER_API_VERSION' to propagate the variable into child process containers.
Note: the stock run.sh already had this fix for cwltool, toil, and ep3 — but not for Nextflow. Also, local edits to run.sh required bind-mounting the file into the container via compose.yml:
volumes:
  - ${PWD}/sapporo/run.sh:/app/sapporo/run.sh:ro
Run 3 — Mount denied for pipeline assets

docker: Error response from daemon: mounts denied:
The path /.nextflow/assets/nf-core/rnaseq/bin is not shared from the host
and is not known to Docker.

Nextflow stores cloned pipeline assets at /.nextflow/assets/ inside its container. When spawning child containers, it tries to bind-mount that path — but Docker Desktop on macOS cannot access paths inside another container.
Fix: Set NXF_HOME and NXF_ASSETS environment variables to point into the shared run directory (${run_dir}/nxf_home), which is host-mounted and accessible to child containers.
Run 4 — Success

Pipeline ran for ~16 minutes and completed successfully.
Step 4: Verify results

curl -s localhost:1122/runs/25fedfa2-9792-4f62-a391-6a2da2a72628 | jq '.state'
# "COMPLETE"
955 output files produced across 9 directories:


Directory
Contents


bbsplit
Contamination screening stats


custom
Merged genome + GTF (with GFP spike-in)


fastqc
Raw read quality reports


fq_lint
FASTQ format validation


multiqc
Aggregated QC report


pipeline_info
Execution metadata and resource usage


salmon
Transcript-level quantification


star_salmon
STAR alignment + Salmon quantification


trimgalore
Adapter-trimmed reads and trim reports


Summary of changes to Sapporo

Two files were modified from the upstream defaults:
sapporo/run.sh — run_nextflow() function


Added DOCKER_API_VERSION=1.44 env var
Added NXF_HOME / NXF_ASSETS env vars pointing to the shared run directory
Added a Nextflow config file (sapporo.config) with docker.envWhitelist

compose.yml


Bind-mounted the local sapporo/run.sh into the container at /app/sapporo/run.sh:ro


Part 2: Human HCC RNA-seq Analysis (GSE128274)

Dataset

Study: GSE128274 — "Analyses of a panel of transcripts and construction of RNA networks in hepatocellular carcinoma"
BioProject: PRJNA526922 | Organism: Homo sapiens
Design: 4 HCC patients, paired tumor (P) + adjacent normal (C), paired-end Illumina (NextSeq 500)


Accession
Sample
Type


SRR8723780
P1
Tumor


SRR8723781
C1
Normal


SRR8723782
P2
Tumor


SRR8723783
C2
Normal


SRR8723784
P3
Tumor


SRR8723785
C3
Normal


SRR8723786
P5
Tumor


SRR8723787
C5
Normal


Step 1: Fix Sapporo for cross-run data access

Problem: The rnaseq container only mounted its own run_dir, so it couldn't read FASTQ files downloaded by a previous fetchngs run.
Fix: Changed run_nextflow() in sapporo/run.sh to mount ${SAPPORO_RUN_DIR} (the full runs directory) instead of ${run_dir}:
# Before:
-v "${run_dir}:${run_dir}"
# After:
-v "${SAPPORO_RUN_DIR}:${SAPPORO_RUN_DIR}"
Step 2: Download data with nf-core/fetchngs

Submitted nf-core/fetchngs v1.12.0 via Sapporo:

Run ID: 71265eb3-6d6e-47d4-9899-e29067bc2a2d
Duration: 3h 19m 56s
Result: 16 FASTQ files (~32 GB), samplesheet auto-generated at outputs/samplesheet/samplesheet.csv
Note: SRR8723787 required 3 FTP retries (intermittent ENA connection issues); all 8 samples completed successfully.

Parameters (fetchngs_params.json):
{
  "input": "<runs>/shared/ids.csv",
  "outdir": "results",
  "nf_core_pipeline": "rnaseq",
  "max_memory": "6.GB",
  "max_cpus": 4
}
Step 3: Run nf-core/rnaseq with Salmon pseudo-alignment

Run ID: f7eeab2d-36d3-40d5-8b1c-ec615d45b44d
Duration: 13h 9m 54s | Tasks: 66/66 succeeded | CPU hours: 52.6
Parameters (rnaseq_params.json):
{
  "input": "<fetchngs-run>/outputs/samplesheet/samplesheet.csv",
  "outdir": "results",
  "genome": "GRCh38",
  "pseudo_aligner": "salmon",
  "skip_alignment": true,
  "max_memory": "30.GB",
  "max_cpus": 4
}
Iteration required: memory limits

Two preliminary runs failed due to nf-core/rnaseq's resource requests exceeding Docker Desktop's 31.3 GB memory:

Run 1 (32cc2b3b): FastQC requested 36 GB (6 threads × 6 GB) — pipeline max_memory: 24.GB did not cap it.
Run 2 (57276298): After adding a custom Nextflow config to cap FastQC, TrimGalore requested 72 GB.

Fix: Created runs/shared/rnaseq.config explicitly overriding all process labels:
params.max_memory = '28.GB'
params.max_cpus = 4

process {
    withLabel: 'process_high'   { cpus = 4; memory = '24.GB' }
    withLabel: 'process_medium' { cpus = 4; memory = '24.GB' }
    withLabel: 'process_low'    { cpus = 2; memory = '12.GB' }
    withLabel: 'process_single' { cpus = 1; memory = '6.GB'  }
    // ...
}
Passed via engine parameters: {"-profile": "docker", "-c": "<path>/rnaseq.config"}.
Step 4: Results

361 output files (304.9 MB) across 6 directories:


Directory
Files
Size
Contents


salmon
121
252.1 MB
Per-sample quant, merged count/TPM matrices, SummarizedExperiment RDS


multiqc
138
20.1 MB
Aggregated QC report


fastqc
64
29.3 MB
Per-sample FastQC reports


trimgalore
16
75.6 KB
Trim reports


fq_lint
16
22.5 KB
FASTQ validation


pipeline_info
6
3.3 MB
Execution metadata


Key output files:

salmon/salmon.merged.gene_counts.tsv — gene-level count matrix (8 samples)
salmon/salmon.merged.gene_tpm.tsv — TPM expression matrix
salmon/salmon.merged.gene.SummarizedExperiment.rds — ready for DESeq2/edgeR
multiqc/multiqc_report.html — aggregated QC across all samples


Step 5: Generate run summary from RO-Crate

After a successful run, Sapporo generates a Workflow Run RO-Crate (ro-crate-metadata.json) containing structured metadata about the execution. A Python script (summarize_crate.py) was created to parse this file and produce a human-readable Markdown summary.
Script overview


Location: summarize_crate.py
Dependencies: Python stdlib only (json, sys, datetime, collections)
Usage: python summarize_crate.py <path/to/ro-crate-metadata.json> > summary.md

Generated sections


Header — Run name, ID, and completion status
Run Overview — Workflow name/URL, language (Nextflow DSL2), engine versions (nextflow, sapporo 2.2.2), container image (nextflow/nextflow:25.10.4), start/end times, duration (15m 21s), exit code
Input Parameters — 3 parameters: outdir, max_memory, max_cpus
Output Summary — 955 files totalling 75.7 MB, broken down by 9 top-level directories, with a list of 14 output MIME types
Alignment Statistics — Per-sample stats (total reads, mapped reads/rate, duplicate reads/rate) for 5 samples, derived from FileStats entities linked to BAM files
Software Versions — All SoftwareApplication entities: nextflow, samtools (1.23), sapporo (2.2.2)

Implementation notes


Builds an @id → entity lookup dict for reference resolution
Parses actionStatus URLs to friendly text (e.g. CompletedActionStatus → Completed)
Computes duration from ISO 8601 timestamps
Formats file sizes human-readably (B, KB, MB, GB)
Groups output files by first path component after outputs/
Links FileStats back to parent BAM File entities via reverse lookup on the stats field
Handles missing fields gracefully


## rnaseq_hcc_summary.md

      
    Raw
  

              rnaseq_hcc_summary.md
            
          
    Sapporo workflow run f7eeab2d-36d3-40d5-8b1c-ec615d45b44d

Run ID: f7eeab2d-36d3-40d5-8b1c-ec615d45b44d
Status: Completed
Run Overview


Item
Value


Workflow
rnaseq


Workflow language
Nextflow DSL2


Engine
nextflow, sapporo 2.2.2


Container image
nextflow/nextflow:25.10.4


Start time
2026-02-24 12:25:18 UTC


End time
2026-02-25 01:35:37 UTC


Duration
13h 10m 19s


Exit code
0


Input Parameters


Name
Value


input
/Users/inutano/work/wes-test/sapporo-service/runs/71/71265eb3-6d6e-47d4-9899-e29067bc2a2d/outputs/samplesheet/samplesheet.csv


outdir
results


genome
GRCh38


pseudo_aligner
salmon


skip_alignment
True


max_memory
30.GB


max_cpus
4


Output Summary

Total files: 361
Total size: 304.9 MB
Breakdown by Directory


Directory
Files
Size


fastqc
64
29.3 MB


fq_lint
16
22.5 KB


multiqc
138
20.1 MB


pipeline_info
6
3.3 MB


salmon
121
252.1 MB


trimgalore
16
75.6 KB


Output Formats


application/gzip
application/json
application/octet-stream
application/pdf
application/zip
image/png
image/svg+xml
text/html
text/plain

Software Versions


Name
Version
URL


nextflow
N/A
https://www.nextflow.io


sapporo
2.2.2
https://github.com/sapporo-wes/sapporo-service
Directory	Contents
`bbsplit`	Contamination screening stats
`custom`	Merged genome + GTF (with GFP spike-in)
`fastqc`	Raw read quality reports
`fq_lint`	FASTQ format validation
`multiqc`	Aggregated QC report
`pipeline_info`	Execution metadata and resource usage
`salmon`	Transcript-level quantification
`star_salmon`	STAR alignment + Salmon quantification
`trimgalore`	Adapter-trimmed reads and trim reports
Accession	Sample	Type
SRR8723780	P1	Tumor
SRR8723781	C1	Normal
SRR8723782	P2	Tumor
SRR8723783	C2	Normal
SRR8723784	P3	Tumor
SRR8723785	C3	Normal
SRR8723786	P5	Tumor
SRR8723787	C5	Normal
Directory	Files	Size	Contents
`salmon`	121	252.1 MB	Per-sample quant, merged count/TPM matrices, SummarizedExperiment RDS
`multiqc`	138	20.1 MB	Aggregated QC report
`fastqc`	64	29.3 MB	Per-sample FastQC reports
`trimgalore`	16	75.6 KB	Trim reports
`fq_lint`	16	22.5 KB	FASTQ validation
`pipeline_info`	6	3.3 MB	Execution metadata
Item	Value
Workflow	rnaseq
Workflow language	Nextflow DSL2
Engine	nextflow, sapporo 2.2.2
Container image	nextflow/nextflow:25.10.4
Start time	2026-02-24 12:25:18 UTC
End time	2026-02-25 01:35:37 UTC
Duration	13h 10m 19s
Exit code	0
Name	Value
input	`/Users/inutano/work/wes-test/sapporo-service/runs/71/71265eb3-6d6e-47d4-9899-e29067bc2a2d/outputs/samplesheet/samplesheet.csv`
outdir	`results`
genome	`GRCh38`
pseudo_aligner	`salmon`
skip_alignment	`True`
max_memory	`30.GB`
max_cpus	`4`
Directory	Files	Size
fastqc	64	29.3 MB
fq_lint	16	22.5 KB
multiqc	138	20.1 MB
pipeline_info	6	3.3 MB
salmon	121	252.1 MB
trimgalore	16	75.6 KB
Name	Version	URL
nextflow	N/A	https://www.nextflow.io
sapporo	2.2.2	https://github.com/sapporo-wes/sapporo-service