Skip to content

Instantly share code, notes, and snippets.

@bede
Last active November 26, 2025 12:59
Show Gist options
  • Select an option

  • Save bede/db194b2bf6fb223b2e8472268a45f0f6 to your computer and use it in GitHub Desktop.

Select an option

Save bede/db194b2bf6fb223b2e8472268a45f0f6 to your computer and use it in GitHub Desktop.

Comparing coverage & abundance estimates using Grate, and on-target length distributions

Objectives:

  • 'Fish' for reference genomes in demultiplexed FASTQs for one or more experimental conditions (i.e. Prom runs).
  • Plot estimated coverage and abundance by target genome, and condition/barcode.
  • Calculate the length distributions of Zepto reads between conditions/barcodes.

1. Fetch data

e.g. a run from CLIMB object storage

aws s3 sync s3://quick-research-bede-storage/runs-cat/DS_NEBtagged_sequenase_Zepto_18_11_2025 DS_NEBtagged_sequenase_Zepto_18_11_2025

2. Install dependencies

Grate

Since grate isn't yet packaged with Bioconda, we must build it locally. First install Rust if cargo command isn't found:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then you can install grate from GitHub using cargo:

RUSTFLAGS="-C target-cpu=native" cargo install --git https://github.com/bede/grate

Everything else

Inside an existing environment:

conda install -yc bioconda python=3.12 pandas altair vl-convert-python seqkit deacon

Or by creating a new environment:

conda create -yn grate python=3.12 pandas altair vl-convert-python seqkit deacon

Clone the grate repo since we need the plotting script to hand, and the zmrp repo since we want the latest Zepto refs

git clone https://github.com/bede/grate
git clone https://github.com/bede/zmrp

3. Using grate

Grate README

Test grate cov

e.g.

grate cov zmrp21.combined-segments.fa reads.fastq.gz

Estimate and plot containment and abundance for Zepto refs in 3 fastq files using custom sample names

grate cov -f csv --sample-names sampleA,sampleB,sampleC zmrp21.combined-segments.fa barcode16.fastq.gz barcode17.fastq.gz barcode18.fastq.gz > out.csv

Should we wish to normalise throughput to a fixed upper quantity, we can use grate's --limit feature:

grate cov -f csv --limit 1G --sample-names sampleA,sampleB,sampleC zmrp21.combined-segments.fa barcode16.fastq.gz barcode17.fastq.gz barcode18.fastq.gz > out.csv

Ensuring the conda environment we used/created earlier is active:

python grate/viz.py out.csv  # Creates containment.png

4. Estimating on-target read length distributions with Deacon and Seqkit

Build Deacon index of Zepto refs:

deacon index build zmrp/zmrp21.fa > zmrp.idx

Search for reads with Zepto hits using Deacon and view their length distribution with Seqkit:

deacon filter -a 1 zmrp.idx DS_NEBtagged_sequenase_Zepto_18_11_2025-partial/barcode16.fastq.gz | seqkit watch --bins 20 -O barcode16-lendist.png

As above, but instead of plotting an PNG image, dump the output to a TSV file:

deacon filter -a 1 zmrp.idx DS_NEBtagged_sequenase_Zepto_18_11_2025-partial/barcode16.fastq.gz | seqkit watch --bins 20 --dump &> barcode16-lendist.tsv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment