Skip to content

Instantly share code, notes, and snippets.

@avilaHugo
Created November 5, 2023 19:53
Show Gist options
  • Select an option

  • Save avilaHugo/23299a71c2ce640ae7c6522e91c9439d to your computer and use it in GitHub Desktop.

Select an option

Save avilaHugo/23299a71c2ce640ae7c6522e91c9439d to your computer and use it in GitHub Desktop.
Get n genes
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import sys
import argparse
import pandas as pd
def main(table: str, n_genes: int, target_colname: str) -> None:
data = (
pd.read_csv(
table,
sep='\t',
)
.loc[
lambda df_: df_.padj.notna(),
target_colname,
]
.sort_values(ascending=False)
.head(n_genes)
.index
)
print(
*data,
sep='\n',
file=sys.stdout
)
if __name__ == '__main__':
parser = argparse.ArgumentParser(
prog='get_top_n',
description='Reads a table and extract the N top expressed genes.',
)
parser.add_argument(
'table',
)
parser.add_argument(
'-n', '--n_genes',
help='Number of top genes to take.',
default=5,
type=int
)
parser.add_argument(
'-t', '--target_colname',
help='Target column to sort.',
type=str
)
args = vars(parser.parse_args())
main(
**args
)
@avilaHugo
Copy link
Author

Instructions

Download test table

# md5sum -> 9fa6acda7b96d7b1bf423d5f843425f5  DESeq2_results.txt
wget https://raw.githubusercontent.com/snandiDS/prokseq-v2.0/master/example_output/Output/DiffExpResults/DESeq2_results.txt

Run the script

This command will pull a python image with pandas and run the script. Fell free to replace the variable values.

docker run --rm -it \
    -v $PWD:/tmp/run \
    -w /tmp/run \
    amancevice/pandas:alpine-2.1.2 \
        python3 ./get_n_genes.py  \
            -n 10 \
            -t 'padj' \
            DESeq2_results.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment