Objectives:
- 'Fish' for reference genomes in demultiplexed FASTQs for one or more experimental conditions (i.e. Prom runs).
- Plot estimated coverage and abundance by target genome, and condition/barcode.
- Calculate the length distributions of Zepto reads between conditions/barcodes.
aws s3 sync s3://quick-research-bede-storage/runs-cat/DS_NEBtagged_sequenase_Zepto_18_11_2025 DS_NEBtagged_sequenase_Zepto_18_11_2025
Since grate isn't yet packaged with Bioconda, we must build it locally.
First install Rust if cargo command isn't found:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Then you can install grate from GitHub using cargo:
RUSTFLAGS="-C target-cpu=native" cargo install --git https://github.com/bede/grate
Inside an existing environment:
conda install -yc bioconda python=3.12 pandas altair vl-convert-python seqkit deacon
Or by creating a new environment:
conda create -yn grate python=3.12 pandas altair vl-convert-python seqkit deacon
Clone the grate repo since we need the plotting script to hand, and the zmrp repo since we want the latest Zepto refs
git clone https://github.com/bede/grate
git clone https://github.com/bede/zmrp
e.g.
grate cov zmrp21.combined-segments.fa reads.fastq.gz
Estimate and plot containment and abundance for Zepto refs in 3 fastq files using custom sample names
grate cov -f csv --sample-names sampleA,sampleB,sampleC zmrp21.combined-segments.fa barcode16.fastq.gz barcode17.fastq.gz barcode18.fastq.gz > out.csv
Should we wish to normalise throughput to a fixed upper quantity, we can use grate's --limit feature:
grate cov -f csv --limit 1G --sample-names sampleA,sampleB,sampleC zmrp21.combined-segments.fa barcode16.fastq.gz barcode17.fastq.gz barcode18.fastq.gz > out.csv
Ensuring the conda environment we used/created earlier is active:
python grate/viz.py out.csv # Creates containment.png
Build Deacon index of Zepto refs:
deacon index build zmrp/zmrp21.fa > zmrp.idx
Search for reads with Zepto hits using Deacon and view their length distribution with Seqkit:
deacon filter -a 1 zmrp.idx DS_NEBtagged_sequenase_Zepto_18_11_2025-partial/barcode16.fastq.gz | seqkit watch --bins 20 -O barcode16-lendist.png
As above, but instead of plotting an PNG image, dump the output to a TSV file:
deacon filter -a 1 zmrp.idx DS_NEBtagged_sequenase_Zepto_18_11_2025-partial/barcode16.fastq.gz | seqkit watch --bins 20 --dump &> barcode16-lendist.tsv