Cyriac Kandoth ckandoth

## llama-cpp-on-windows.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                ckandoth
                / llama-cpp-on-windows.md
            
            
              Created
              March 9, 2026 15:58
            
              
                Use llama.cpp to run LLMs locally on Windows
              
          
    This guide details how to set up a LLaMA.cpp HTTP Server
with GPU acceleration on a fresh install of Windows 11 (25H2). With immensely smarter frontier AI models available exclusively online,
because most of us cannot afford the hardware needed to run them locally, there are few reasons to run local LLMs. But I have found that
tinkering with the runtime config of local LLMs is the best way to learn how these models work. And they serve as a tool that a smarter
AI agent can operate, reducing token usage of more expensive models. Finally it puts my GeForce RTX 5090 GPU to work, when not running
Rocket League.
Press Windows Key + R, type cmd, and press Enter to open a black window running Windows CMD, a command-line interface (CLI) that
has existed in Windows since 1987. It will probably never die. Copy and paste the following command into the CLI and press Enter to
install the tools we'll need. If this is the first time you're

  
## az-nirvana.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                ckandoth
                / az-nirvana.md
            
            
              Created
              October 20, 2025 07:42
            
              
                Prototype Illumina's Nirvana on Dragen Pay-As-You-Go (PAYG) VMs on Azure
              
          
    Purpose

Prototype Illumina's Nirvana on Dragen Pay-As-You-Go (PAYG) VMs on Azure.
Prerequisites


Sign up for an Azure subscription at this link if you don't already have one.
Follow these instructions to register resource providers Microsoft.Network, Microsoft.Storage, and Microsoft.Compute. Strictly speaking, you don't need Microsoft.Storage for the steps in this guide. But in a production environment, it is recommended to use blob storage for inputs/outputs. You should also upload Nirvana annotation source files into your own blob storage account so that you can quickly deploy them into multiple ephemeral VMs that run Nirvana in parallel on each sample.
Visit this page, login if needed, and increase your Quota for `To


## pgscalc_with_wgs_gvcfs.sh
# GOAL: Prototype a bioinformatics pipeline to calculate Polygenic Risk Scores (PRS) using WGS gVCFs.

# Let's use pgscalc (https://github.com/pgscatalog/pgsc_calc) a nextflow pipeline to calculate PRS, given PR weights and
# a multi-sample VCF. Variant allele weights can be specified either as PGS Catalog IDs (--pgs_id) and/or as custom scoring
# files (--scorefile). pgscalc can also perform liftover if scoring files use a different reference genome build than the
# input VCFs (--liftover --target_build).

# Steps below were tested on Ubuntu 24.04. But should work fine with any Linux server using bash.

# ----- #

## dgn_np10_errors.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                ckandoth
                / dgn_np10_errors.md
            
            
              Last active
              October 14, 2025 05:43
            
              
                Reproduce sporadic "double free" errors by Dragen on Azure NP10 VMs
              
          
    Purpose

Reproduce the double free or corruption (fasttop) error from Dragen 4.3.6 on their CentOS 7.9 image on Azure NP10 VMs.
Update (Mar 2025)

This error was linked to Dragen's code for streaming FASTQs from Azure blob storage. As a workaround, we now use azcopy to copy FASTQs onto VM local storage before alignment, which significantly reduces runtime too. The errors also disappear in AlmaLinux 8, which all newer Dragen VM images are based on.
Prerequisites


## az-dgn.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              2 stars
            
          
                ckandoth
                / az-dgn.md
            
            
              Last active
              February 2, 2026 14:46
            
              
                Build an Azure stack to operate NP-series VMs on Azure with Dragen's pay-as-you-go (PAYG) license
              
          
    Purpose

Build an Azure stack to operate NP-series VMs on Azure with Dragen's pay-as-you-go (PAYG) license.
Prerequisites


Sign up for an Azure subscription at this link if you don't already have one.
Follow these instructions to register resource providers Microsoft.Network, Microsoft.Storage, and Microsoft.Compute.
Visit this page, login if needed, and ensure that Status is set to Enable for the Azure subscription you intend to use. T


## dx-ngs.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                ckandoth
                / dx-ngs.md
            
            
              Last active
              October 13, 2025 12:21
            
              
                Clinical NGS bioinformatics server at reasonable cost and TAT
              
          
    Purpose

A proof-of-concept high-performance server for primary and secondary NGS analyses with reasonable cost and TAT.
Hardware and OS

Acquired a Dell Precision 5820 tower workstation in mid 2018 with the following specs. Minimally, you want fast single-thread performance, at least 64GB RAM preferably ECC, and very speedy disks. A GPU with at least 16GB VRAM allows you to run Nvidia's Parabricks v4.4 or at least 12GB VRAM for Parabricks v3.8.

Intel Xeon W-2145 (supports ECC memory and AVX-512; decent single-thread performance)
208GB DDR4-2666 ECC Memory (ECC reduces odds of data corruption)


## ensembl_vep_112_with_offline_cache.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              3 stars
            
          
                ckandoth
                / ensembl_vep_112_with_offline_cache.md
            
            
              Created
              May 30, 2024 19:27
            
              
                Install Ensembl's VEP v112 with local cache for running offline
              
          
    Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.
The official instructions to install VEP have never worked well from the United States because of the flaky network connection to their FTP servers in the UK. So, we will instead use conda to install VEP and its dependencies and then manually download VEP caches and reference genomes using rsync.
If you don't already have conda, download and install it into $HOME/miniconda3:
curl -sL https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o miniconda.sh
bash miniconda.sh -bup $HOME/minic

  
## spliceai_on_wsl2.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                ckandoth
                / spliceai_on_wsl2.md
            
            
              Last active
              February 10, 2026 18:05
            
              
                Run SpliceAI with tensorflow-gpu on WSL2
              
          
    SpliceAI with GPU Acceleration on Windows 11 (WSL2)

WSL2 is a Windows feature that allows near-native execution of ELF binaries like Unix bash and Gnome apps. With GPU acceleration support, it turns a gaming PC into a powerful bioinformatics workstation without needing to dual-boot.
This guide details how to install WSL2 on a fresh Windows 11 (25H2) system and use Pixi, a high-performance package manager, to install SpliceAI and its GPU dependencies.
1. Install Ubuntu 24.04 on WSL2


Install WSL2:


Press Windows Key + R, type cmd, and press Enter to open Terminal.


## gnomad_vcf_prep.txt
# Fetch the WGS gnomAD 3.1.2 per-chrom VCFs (the large size is mostly due to INFO fields):
mkdir gnomad
gsutil -m cp gs://gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr*.vcf.bgz gnomad
gsutil -m cp gs://gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr*.vcf.bgz.tbi gnomad

# Shortlist INFO fields we want to keep when merging these into a single VCF of reduced file size:
bcftools view -h gnomad/gnomad.genomes.v3.1.2.sites.chr21.vcf.bgz | grep ^##INFO | cut -f3- -d= |  grep -Ev "controls|non_cancer|non_neuro|non_topmed|non_v2|vep" | sort | less -S

cadd_phred
cadd_raw_score

## test_az_sdk_blob_upload.py
#!/usr/bin/env python

# Prereqs: Run "az login" to get a refresh token at "~/.azure/msal_token_cache.json" which expires only if unused for 90 days
# Depends: pip install azure-identity azure-storage-blob
# Sources: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/samples/blob_samples_containers.py

STORAGE_ACCOUNT_URL = "https://blahdiblahdiblah.blob.core.windows.net"
CONTAINER_NAME = "mdlhot"

# Use the MSAL refresh token to get a temporary access token for use with blob storage libraries
	# GOAL: Prototype a bioinformatics pipeline to calculate Polygenic Risk Scores (PRS) using WGS gVCFs.

	# Let's use pgscalc (https://github.com/pgscatalog/pgsc_calc) a nextflow pipeline to calculate PRS, given PR weights and
	# a multi-sample VCF. Variant allele weights can be specified either as PGS Catalog IDs (--pgs_id) and/or as custom scoring
	# files (--scorefile). pgscalc can also perform liftover if scoring files use a different reference genome build than the
	# input VCFs (--liftover --target_build).

	# Steps below were tested on Ubuntu 24.04. But should work fine with any Linux server using bash.

	# ----- #
	# Fetch the WGS gnomAD 3.1.2 per-chrom VCFs (the large size is mostly due to INFO fields):
	mkdir gnomad
	gsutil -m cp gs://gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr*.vcf.bgz gnomad
	gsutil -m cp gs://gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr*.vcf.bgz.tbi gnomad

	# Shortlist INFO fields we want to keep when merging these into a single VCF of reduced file size:
	bcftools view -h gnomad/gnomad.genomes.v3.1.2.sites.chr21.vcf.bgz \| grep ^##INFO \| cut -f3- -d= \| grep -Ev "controls\|non_cancer\|non_neuro\|non_topmed\|non_v2\|vep" \| sort \| less -S

	cadd_phred
	cadd_raw_score
	#!/usr/bin/env python

	# Prereqs: Run "az login" to get a refresh token at "~/.azure/msal_token_cache.json" which expires only if unused for 90 days
	# Depends: pip install azure-identity azure-storage-blob
	# Sources: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/samples/blob_samples_containers.py

	STORAGE_ACCOUNT_URL = "https://blahdiblahdiblah.blob.core.windows.net"
	CONTAINER_NAME = "mdlhot"

	# Use the MSAL refresh token to get a temporary access token for use with blob storage libraries