Welcome to GSEAPY’s documentation!¶
GSEAPY: Gene Set Enrichment Analysis in Python.¶
GSEAPY is a python wrapper for GESA and Enrichr.¶
It’s used for convenient GO enrichments and produce publication-quality figures from python.
GSEAPY could be used for RNA-seq, ChIP-seq, Microarry data.
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
GSEA is far too extensive to describe here; see
GSEA documentation for more information.
Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr .
I would like to use Pandas to explore my data, but I did not find a convenient tool to do gene set enrichment analysis in python. So, here are my reasons:
- Ability to run inside python interactive console without having to switch to R!!!
- User-friendly for both wet and dry lab users.
- Produce or reproduce publishable figures.
- Perform batch jobs easy.
- Easy to use in bash shell or your data analysis workflow, e.g. snakemake.
GSEA Java version output:¶
This is an example of GSEA desktop application output
Prerank module output¶
Using the same data from
GSEA, GSEAPY reproduces the example above.
replot module will reproduce the same figure for GSEA Java desktop outputs
GSEAPY figures are supported by all matplotlib figure formats.
You can modify
GSEA plots easily in .pdf files. Please Enjoy.
A graphical introduction of Enrichr
The only thing you need to prepare is a gene list file in txt format(one gene id per row), or a python list object.
Note: Enrichr uses a list of Entrez gene symbols as input. You should convert all gene names to uppercase.
For example, both a list object and txt file are supported for
# if you prefer to run gseapy.enrchr() inside python console, you could assign a list object to # gseapy like this. gene_list = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1', 'CSF1', 'CITED1', 'SYNPO2L']
# an alternative way is that you could provide a gene list txt file which looks like this: with open('data/gene_list.txt') as genes: print(genes.read()) CTLA2B SCARA3 LOC100044683 CMBL CLIC6 IL13RA1 TACSTD2 DKKL1 CSF1 CITED1 SYNPO2L TINAGL1 PTX3
# if you have conda $ conda install -c conda-forge -c bioconda gseapy # or use pip to install the latest release $ pip install gseapy
$ pip install git+git://github.com/BioNinja/gseapy.git#egg=gseapy