Welcome to GSEAPY’s documentation!

GSEAPY: Gene Set Enrichment Analysis in Python.

https://badge.fury.io/py/gseapy.svg https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square https://travis-ci.org/zqfang/GSEApy.svg?branch=master Documentation Status https://img.shields.io/badge/license-MIT-blue.svg https://img.shields.io/badge/python-3.6-blue.svg https://img.shields.io/badge/python-2.7-blue.svg

GSEAPY is a python wrapper for GESA and Enrichr.

It’s used for convenient GO enrichments and produce publishable quality figures from python.

GSEAPY could be used for RNA-seq, ChIP-seq, Microarry data.

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

The full GSEA is far too extensive to describe here; see GSEA documentation for more information.

Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr .


I would like to use Pandas to explore my data, but I did not find a convenient tool to do gene set enrichment analysis in python. So, here is my reason:

  • Running inside python interactive console without switch to R!!!
  • User friendly for both wet and dry lab usrers.
  • Produce publishable figures.
  • Perform batch jobs easy(using for loops).
  • Easy to use in bash shell or your data analysis workflow, e.g. snakemake.

GSEA Java version output:

This is an example of GSEA desktop application output


GSEAPY Prerank module output

Using the same data from GSEA, GSEAPY reproduce the example above.

Using Prerank or replot module will reproduce the same figure for GSEA Java desktop outputs


Generated by GSEAPY

GSEAPY figures are supported by all matplotlib figure formats.

You can modify GSEA plots easily in .pdf files. Please Enjoy.

GSEAPY enrichr module

A graphical introduction of Enrichr


The only thing you need to prepeare is a gene list file in txt format(one gene id per row), or a python list object.

Note: Enrichr uses a list of Entrez gene symbols as input. You should convert all gene names to uppercase.

For example, both a list object and txt file are supported for enrichr API

# if you prefer to run gseapy.enrchr() inside python console, you could assign a list object to
# gseapy like this.
gene_list = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1',
                'CSF1', 'CITED1', 'SYNPO2L']
# an alternative way is that you could provide a gene list txt file which looks like this:
with open('data/gene_list.txt') as genes:



Install gseapy package from bioconda or pypi.
# if you have conda
$ conda install -c bioconda gseapy

# for windows users
$ conda install -c bioninja gseapy

# or use pip to install the latest release
$ pip install gseapy
You may instead want to use the development version from Github, by running
$ pip install git+git://github.com/BioNinja/gseapy.git#egg=gseapy


  • Python 2.7 or 3.4+


  • Numpy
  • Pandas
  • Matplotlib
  • Beautifulsoup4
  • Requests(for enrichr API)

You may also need to install lxml, html5lib, if you could not parse xml files.

For more information to use this library,see the How to Use GSEAPY.