GSEAPY Example

Examples to use GSEApy inside python console

[1]:
# %matplotlib inline
# %config InlineBackend.figure_format='retina' # mac
%load_ext autoreload
%autoreload 2
import pandas as pd
import gseapy as gp
import matplotlib.pyplot as plt

Check gseapy version

[2]:
gp.__version__
[2]:
'1.0.0'

1. (Optional) Convert IDs Using Biomart API

Don’t use this if you don’t know Biomart

from gseapy import Biomart
bm = Biomart()
## view validated marts
marts = bm.get_marts()
## view validated dataset
datasets = bm.get_datasets(mart='ENSEMBL_MART_ENSEMBL')
## view validated attributes
attrs = bm.get_attributes(dataset='hsapiens_gene_ensembl')
## view validated filters
filters = bm.get_filters(dataset='hsapiens_gene_ensembl')
## query results
queries = ['ENSG00000125285','ENSG00000182968'] # need to be a python list
results = bm.query(dataset='hsapiens_gene_ensembl',
                   attributes=['ensembl_gene_id', 'external_gene_name', 'entrezgene_id', 'go_id'],
                   filters={'ensemble_gene_id': queries})

Mouse gene symbols maps to Human, or vice verse

[3]:
from gseapy import Biomart
bm = Biomart()
m2h = bm.query_simple(dataset='mmusculus_gene_ensembl',
                          attributes=['ensembl_gene_id','external_gene_name',
                                      'hsapiens_homolog_ensembl_gene',
                                      'hsapiens_homolog_associated_gene_name'])

h2m = bm.query_simple(dataset='hsapiens_gene_ensembl',
                          attributes=['ensembl_gene_id','external_gene_name',
                                      'mmusculus_homolog_ensembl_gene',
                                      'mmusculus_homolog_associated_gene_name'])
[4]:
h2m.sample(10)
[4]:
ensembl_gene_id external_gene_name mmusculus_homolog_ensembl_gene mmusculus_homolog_associated_gene_name
10960 ENSG00000233754 NaN NaN NaN
34758 ENSG00000286642 NaN NaN NaN
1771 ENSG00000274619 CFD NaN NaN
65839 ENSG00000239710 RN7SL692P NaN NaN
5812 ENSG00000225255 PSLNR NaN NaN
37023 ENSG00000227542 MYO1B-AS1 NaN NaN
65896 ENSG00000260158 NaN NaN NaN
39506 ENSG00000102003 SYP ENSMUSG00000031144 Syp
20261 ENSG00000250405 NaN NaN NaN
44283 ENSG00000222982 RN7SKP216 NaN NaN

2. Enrichr Example

[5]:
# read in an example gene list
gene_list = pd.read_csv("./tests/data/gene_list.txt",header=None, sep="\t")
gene_list.head()
[5]:
0
0 IGKV4-1
1 CD55
2 IGKC
3 PPFIBP1
4 ABHD4
[6]:
# convert dataframe or series to list
glist = gene_list.squeeze().str.strip().tolist()
print(glist[:10])
['IGKV4-1', 'CD55', 'IGKC', 'PPFIBP1', 'ABHD4', 'PCSK6', 'PGD', 'ARHGDIB', 'ITGB2', 'CARD6']

See all supported enrichr library names

Select database from { ‘Human’, ‘Mouse’, ‘Yeast’, ‘Fly’, ‘Fish’, ‘Worm’ }

Enrichr library could be used for gsea, ssgsea, and prerank, too

[7]:
names = gp.get_library_name() # default: Human
names[:10]
[7]:
['ARCHS4_Cell-lines',
 'ARCHS4_IDG_Coexp',
 'ARCHS4_Kinases_Coexp',
 'ARCHS4_TFs_Coexp',
 'ARCHS4_Tissues',
 'Achilles_fitness_decrease',
 'Achilles_fitness_increase',
 'Aging_Perturbations_from_GEO_down',
 'Aging_Perturbations_from_GEO_up',
 'Allen_Brain_Atlas_10x_scRNA_2021']
[8]:
yeast = gp.get_library_name(organism='Yeast')
yeast[:10]
[8]:
['Cellular_Component_AutoRIF',
 'Cellular_Component_AutoRIF_Predicted_zscore',
 'GO_Biological_Process_2018',
 'GO_Biological_Process_AutoRIF',
 'GO_Biological_Process_AutoRIF_Predicted_zscore',
 'GO_Cellular_Component_2018',
 'GO_Cellular_Component_AutoRIF',
 'GO_Cellular_Component_AutoRIF_Predicted_zscore',
 'GO_Molecular_Function_2018',
 'GO_Molecular_Function_AutoRIF']

2.1 Assign enrichr with pd.Series, pd.DataFrame, or list object

2.1.1 gene_sets support list, str.

Multi-libraries names supported, separate each name by comma or input a list.

For example:

# gene_list
gene_list="./data/gene_list.txt",
gene_list=glist
# gene_sets
gene_sets='KEGG_2016'
gene_sets='KEGG_2016,KEGG_2013'
gene_sets=['KEGG_2016','KEGG_2013']
[9]:
# run enrichr
# if you are only intrested in dataframe that enrichr returned, please set no_plot=True

# list, dataframe, series inputs are supported
enr = gp.enrichr(gene_list="./tests/data/gene_list.txt",
                 gene_sets=['KEGG_2016','KEGG_2013'],
                 organism='human', # don't forget to set organism to the one you desired! e.g. Yeast
                 outdir=None, # don't write to disk
                )
[10]:
# obj.results stores all results
enr.results.head(5)
[10]:
Gene_set Term Overlap P-value Adjusted P-value Old P-value Old Adjusted P-value Odds Ratio Combined Score Genes
0 KEGG_2016 Osteoclast differentiation Homo sapiens hsa04380 28/132 3.104504e-13 7.885440e-11 0 0 6.659625 191.802220 LILRA6;ITGB3;LILRA2;LILRA5;PPP3R1;FCGR3B;SIRPA...
1 KEGG_2016 Tuberculosis Homo sapiens hsa05152 31/178 4.288559e-12 5.446470e-10 0 0 5.224941 136.763196 RAB5B;ITGB2;PPP3R1;HLA-DMA;FCGR3B;HLA-DMB;CASP...
2 KEGG_2016 Phagosome Homo sapiens hsa04145 28/154 1.614009e-11 1.366528e-09 0 0 5.490501 136.437381 ATP6V1A;RAB5B;ITGB5;ITGB3;ITGB2;HLA-DMA;FCGR3B...
3 KEGG_2016 Rheumatoid arthritis Homo sapiens hsa05323 19/90 2.197884e-09 1.395656e-07 0 0 6.554453 130.668081 ATP6V1A;ATP6V1G1;ATP6V0B;TGFB1;ITGB2;FOS;ITGAL...
4 KEGG_2016 Leishmaniasis Homo sapiens hsa05140 17/73 3.132614e-09 1.591368e-07 0 0 7.422186 145.336773 TGFB1;IFNGR1;PRKCB;IFNGR2;ITGB2;FOS;MAPK14;HLA...

2.1.2 Local mode of GO analysis

If input a .gmt file or gene_set dict object, enrichr runs local.
You have to specify the background genes, if local mode used

For example:

gene_sets="./data/genes.gmt",
gene_sets={'A':['gene1', 'gene2',...],
           'B':['gene2', 'gene4',...],
           ...}
[11]:
enr2 = gp.enrichr(gene_list="./tests/data/gene_list.txt", # or gene_list=glist
                 gene_sets="./tests/data/genes.gmt",
                 background='hsapiens_gene_ensembl', # or the number of genes, e.g 20000
                 outdir=None,
                 verbose=True)
2022-07-26 16:29:12,288 Connecting to Enrichr Server to get latest library names
2022-07-26 16:29:12,290 User Defined gene sets is given: ./tests/data/genes.gmt
2022-07-26 16:29:12,342 Using all annotated genes with GO_ID as background: hsapiens_gene_ensembl
2022-07-26 16:29:12,353 Background: found 24978 genes
[12]:
enr2.results.head(5)
[12]:
Gene_set Term Overlap P-value Adjusted P-value Odds Ratio Genes
0 CUSTOM140534526718208 BvA_UpIN_A 8/130 0.029765 0.069452 2.343931 HAL;PADI2;IL1R1;MAP3K5;MBOAT2;IQGAP2;MSRB2;PCSK6
1 CUSTOM140534526718208 BvA_UpIN_B 11/124 0.000729 0.005100 3.339341 MBNL3;LPAR1;ARHGDIB;IL1RAP;DYSF;HEBP1;ST3GAL6;...
2 CUSTOM140534526718208 CvA_UpIN_A 1/11 0.267894 0.375052 4.669004 MBOAT2
3 CUSTOM140534526718208 DvA_UpIN_A 16/259 0.002705 0.009466 2.302945 KIF1B;ANXA11;BCL3;NMNAT1;IFNGR2;VNN1;PTGS1;HAL...
4 CUSTOM140534526718208 DvA_UpIN_D 11/218 0.043392 0.075936 1.895541 MBNL3;LPAR1;GNB4;TXNDC5;IL1RAP;DYSF;HEBP1;ST3G...

2.1.3 Plotting

[13]:
# simple plotting function
from gseapy.plot import barplot, dotplot

# to save your figure, make sure that ``ofname`` is not None
barplot(enr.res2d,title='KEGG_2013')
_images/gseapy_example_22_0.png
[14]:
# to save your figure, make sure that ``ofname`` is not None
dotplot(enr.res2d, title='KEGG_2013',cmap='viridis_r')
[14]:
<AxesSubplot:title={'center':'KEGG_2013'}, xlabel='-log$_{10}$(Adjusted P-value)'>
_images/gseapy_example_23_1.png

2.2 Command line usage

You may also want to use enrichr in command line

the option -v will print out the progress of your job

[15]:
# !gseapy enrichr -i ./data/gene_list.txt \
#                 --ds BP2017 \
#                 -g GO_Biological_Process_2017 \
#                 -v -o test/enrichr_BP

3. Prerank example

3.1 Assign prerank() with a pd.DataFrame, pd.Series , or a txt file

Do not include header in your gene list !
GSEApy will skip any data after “#”.
Only contains two columns, or one cloumn with gene_name indexed when assign a DataFrame to prerank
[16]:
rnk = pd.read_csv("./tests/data/temp.rnk", header=None, index_col=0, sep="\t")
rnk.head()
[16]:
1
0
ATXN1 16.456753
UBQLN4 13.989493
CALM1 13.745533
DLG4 12.796588
MRE11A 12.787631
[17]:
rnk.shape
[17]:
(22922, 1)
[18]:
# # run prerank
# # enrichr libraries are supported by prerank module. Just provide the name
# # use 4 process to acceralate the permutation speed
pre_res = gp.prerank(rnk="./tests/data/temp.rnk", # or rnk = rnk,
                     gene_sets='KEGG_2016',
                     processes=4,
                     min_size=5,
                     max_size=1000,
                     permutation_num=1000, # reduce number to speed up testing
                     outdir=None, # don't write to disk
                     seed=6, verbose=True)
2022-07-26 16:29:13,284 Parsing data files for GSEA.............................
2022-07-26 16:29:13,286 Enrichr library gene sets already downloaded in: /home/fangzq/.cache/gseapy, use local file
2022-07-26 16:29:19,494 0001 gene_sets have been filtered out when max_size=1000 and min_size=5
2022-07-26 16:29:19,500 0292 gene_sets used for further statistical testing.....
2022-07-26 16:29:19,501 Start to run GSEA...Might take a while..................
2022-07-26 16:29:58,671 Congratulations. GSEApy runs successfully................

3.2 How to generate your GSEA plot inside python console

Visualize it using gseaplot

Make sure that ofname is not None, if you want to save your figure to the disk

[19]:
pre_res.res2d.head(10)
[19]:
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 prerank Adherens junction Homo sapiens hsa04520 0.784625 1.912548 0.0 0.0 0.0 47/74 10.37% CTNNB1;EGFR;RAC1;TGFBR1;SMAD4;MET;EP300;CDC42;...
1 prerank Glioma Homo sapiens hsa05214 0.784678 1.906706 0.0 0.0 0.0 52/65 16.29% CALM1;GRB2;EGFR;PRKCA;KRAS;HRAS;TP53;MAPK1;PRK...
2 prerank Estrogen signaling pathway Homo sapiens hsa04915 0.766347 1.897957 0.0 0.0 0.0 74/99 16.57% CALM1;PRKACA;GRB2;SP1;EGFR;KRAS;HRAS;HSP90AB1;...
3 prerank Thyroid hormone signaling pathway Homo sapiens... 0.7577 1.891815 0.0 0.0 0.0 84/118 16.29% CTNNB1;PRKACA;PRKCA;KRAS;NOTCH1;EP300;CREBBP;H...
4 prerank Long-term potentiation Homo sapiens hsa04720 0.778249 1.888739 0.0 0.0 0.0 42/66 9.01% CALM1;PRKACA;PRKCA;KRAS;EP300;CREBBP;HRAS;PRKA...
5 prerank GnRH signaling pathway Homo sapiens hsa04912 0.761416 1.882101 0.0 0.0 0.0 61/91 16.57% CALM1;PRKACA;GRB2;EGFR;PRKCA;KRAS;CDC42;HRAS;P...
6 prerank ErbB signaling pathway Homo sapiens hsa04012 0.765574 1.881161 0.0 0.0 0.0 65/87 16.29% GRB2;EGFR;PRKCA;KRAS;HRAS;MAPK1;PRKCB;SRC;NRAS...
7 prerank Pancreatic cancer Homo sapiens hsa05212 0.777715 1.879742 0.0 0.0 0.0 53/66 15.88% EGFR;RAC1;TGFBR1;KRAS;SMAD4;CDC42;TP53;SMAD2;M...
8 prerank Endometrial cancer Homo sapiens hsa05213 0.782427 1.863277 0.0 0.0 0.0 42/52 15.88% CTNNB1;GRB2;EGFR;KRAS;HRAS;TP53;MAPK1;NRAS;MAP...
9 prerank Colorectal cancer Homo sapiens hsa05210 0.77014 1.862508 0.0 0.0 0.0 48/62 15.88% CTNNB1;RAC1;TGFBR1;KRAS;SMAD4;TP53;SMAD2;MAPK1...
[20]:
## easy way
from gseapy import gseaplot
terms = pre_res.res2d.Term
i = 1
# to save your figure, make sure that ofname is not None
gseaplot(rank_metric=pre_res.ranking,
         term=terms[i],
         cmap=plt.cm.seismic,
         **pre_res.results[terms[i]])

# save figure
# gseaplot(rank_metric=pre_res.ranking, term=terms[0], ofname='your.plot.pdf', **pre_res.results[terms[0]])
_images/gseapy_example_33_0.png

3) Command line usage

You may also want to use prerank in command line

[21]:
# ! gseapy prerank -r temp.rnk -g temp.gmt -o prerank_report_temp

4. GSEA Example

4.1 Assign gsea() with a pandas DataFrame, .gct format file, or a text file

and cls with a list object or just .cls format file

[22]:
phenoA, phenoB, class_vector =  gp.parser.gsea_cls_parser("./tests/extdata/Leukemia.cls")
[23]:
#class_vector used to indicate group attributes for each sample
print(class_vector)
['ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'ALL', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML', 'AML']
[24]:
gene_exp = pd.read_csv("./tests/extdata/Leukemia_hgu95av2.trim.txt", sep="\t")
gene_exp.head()
[24]:
Gene NAME ALL_1 ALL_2 ALL_3 ALL_4 ALL_5 ALL_6 ALL_7 ALL_8 ... AML_15 AML_16 AML_17 AML_18 AML_19 AML_20 AML_21 AML_22 AML_23 AML_24
0 MAPK3 1000_at 1633.6 2455.0 866.0 1000.0 3159.0 1998.0 1580.0 1955.0 ... 1826.0 2849.0 2980.0 1442.0 3672.0 294.0 2188.0 1245.0 1934.0 13154.0
1 TIE1 1001_at 284.4 159.0 173.0 216.0 1187.0 647.0 352.0 1224.0 ... 1556.0 893.0 1278.0 301.0 797.0 248.0 167.0 941.0 1398.0 -502.0
2 CYP2C19 1002_f_at 285.8 114.0 429.0 -43.0 18.0 366.0 119.0 -88.0 ... -177.0 64.0 -359.0 68.0 2.0 -464.0 -127.0 -279.0 301.0 509.0
3 CXCR5 1003_s_at -126.6 -388.0 143.0 -915.0 -439.0 -371.0 -448.0 -862.0 ... 237.0 -834.0 -1940.0 -684.0 -1236.0 -1561.0 -895.0 -1016.0 -2238.0 -1362.0
4 CXCR5 1004_at -83.3 33.0 195.0 85.0 54.0 -6.0 55.0 101.0 ... 86.0 -5.0 487.0 102.0 33.0 -153.0 -50.0 257.0 439.0 386.0

5 rows × 50 columns

[25]:
print("positively correlated: ", phenoA)
positively correlated:  ALL
[26]:
print("negtively correlated: ", phenoB)
negtively correlated:  AML
[27]:
# run gsea
# enrichr libraries are supported by gsea module. Just provide the name

gs_res = gp.gsea(data=gene_exp, # or data='./P53_resampling_data.txt'
                 gene_sets='./tests/extdata/h.all.v7.0.symbols.gmt', # enrichr library names
                 cls= "./tests/extdata/Leukemia.cls", # cls=class_vector
                 # set permutation_type to phenotype if samples >=15
                 permutation_type='phenotype',
                 permutation_num=1000, # reduce number to speed up test
                 outdir=None,  # do not write output to disk
                 method='signal_to_noise',
                 processes=4, seed= 7)
2022-07-26 16:29:59,370 Warning: dropping duplicated gene names, only keep the first values
[28]:
#access the dataframe results throught res2d attribute
gs_res.res2d.head()
[28]:
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 gsea HALLMARK_E2F_TARGETS 0.574187 1.661335 0.052521 0.3216 0.345 87/200 23.65% DCK;BARD1;NASP;SRSF2;STMN1;SRSF1;TRA2B;EZH2;SM...
1 gsea HALLMARK_MITOTIC_SPINDLE 0.430183 1.646924 0.026804 0.3216 0.37 74/199 31.44% SPTAN1;SEPT9;ATG4B;SMC1A;MYH10;BIN1;CYTH2;TUBG...
2 gsea HALLMARK_WNT_BETA_CATENIN_SIGNALING 0.438876 1.586567 0.013834 0.3216 0.483 10/42 16.91% LEF1;SKP2;HDAC2;GNAI1;CUL1;MAML1;WNT1;HDAC5;AX...
3 gsea HALLMARK_MYC_TARGETS_V1 0.535105 1.519305 0.156448 0.3216 0.635 111/200 31.98% HNRNPA3;HDDC2;RFC4;SRSF2;SRSF1;TRA2B;RRM1;HNRN...
4 gsea HALLMARK_G2M_CHECKPOINT 0.438496 1.517946 0.094845 0.3216 0.637 80/200 28.04% MTF2;BARD1;NASP;CUL5;SRSF2;STMN1;NUP50;SRSF1;O...

4.2 Show the gsea plots

The gsea module will generate heatmap for genes in each gene sets in the backgroud.
But if you need to do it yourself, use the code below
[29]:
from gseapy import gseaplot, heatmap
terms = gs_res.res2d.Term
i = 2
# Make sure that ``ofname`` is not None, if you want to save your figure to disk
gseaplot(gs_res.ranking, term=terms[i], **gs_res.results[terms[i]])
_images/gseapy_example_46_0.png
[30]:
# plotting heatmap
genes = gs_res.res2d.Lead_genes[i].split(";")
# Make sure that ``ofname`` is not None, if you want to save your figure to disk
heatmap(df = gs_res.heatmat.loc[genes], z_score=0, title=terms[i], figsize=(18,6), cmap=plt.cm.RdBu_r)
_images/gseapy_example_47_0.png

4.3 Command line usage

You may also want to use gsea in command line

[31]:
# !gseapy gsea -d ./data/P53_resampling_data.txt \
#              -g KEGG_2016 -c ./data/P53.cls \
#              -o test/gsea_reprot_2 \
#              -v --no-plot \
#              -t phenotype

5. Single Sample GSEA example

Note: When you run ssGSEA, all genes names in your gene_sets file should be found in your expression table

What’s ssGSEA? Which one should I use? Prerank or ssGSEA

see FAQ here

5.1 Input format

Assign ssgsea() with a txt file, gct file, pd.DataFrame, or pd.Seires(gene name as index)

[32]:
# txt, gct file input
ss = gp.ssgsea(data=gene_exp, # or data='./P53_resampling_data.txt'
               gene_sets='./tests/extdata/h.all.v7.0.symbols.gmt',
               outdir=None,
               sample_norm_method='rank', # choose 'custom' for your own rank list
               no_plot=True,
               processes=4)
2022-07-26 16:30:06,584 Warning: dropping duplicated gene names, only keep the first values
[33]:
ss.res2d.head()
[33]:
Name Term ES NES Tag % Gene % Lead_genes
0 ALL_1 HALLMARK_MYC_TARGETS_V1 6338.956444 0.679758 147/200 31.53% RPS3;RPS6;RPS5;PABPC1;PGK1;RPL18;RACK1;RPL34;E...
1 ALL_2 HALLMARK_MYC_TARGETS_V1 6989.571066 0.749527 144/200 21.79% RPS6;PABPC1;HSP90AB1;ACTG1P16;RPL34;HNRNPA2B1;...
2 ALL_3 HALLMARK_MYC_TARGETS_V1 5523.472832 0.59231 118/200 24.17% RPS3;RPL34;RPS6;RPS5;EIF4G2;PABPC1;RPL18;RACK1...
3 ALL_4 HALLMARK_MYC_TARGETS_V1 6724.813934 0.721136 139/200 22.27% RPS6;PGK1;RPS5;PABPC1;HSP90AB1;RPS3;RPL34;EIF4...
4 ALL_5 HALLMARK_MYC_TARGETS_V1 5945.884371 0.637607 129/200 26.16% RPS6;RPS3;RPS5;RPL18;RPL34;RACK1;EIF4G2;PABPC1...
[34]:
# or assign a dataframe, or Series to ssgsea()
ssdf = pd.read_csv("./tests/data/temp.rnk", header=None,index_col=0,  sep="\t")
ssdf.head()
[34]:
1
0
ATXN1 16.456753
UBQLN4 13.989493
CALM1 13.745533
DLG4 12.796588
MRE11A 12.787631
[35]:
# dataframe with one column is also supported by ssGSEA or Prerank
# But you have to set gene_names as index
ssdf2 = ssdf.squeeze()
[36]:
# Series, DataFrame Example
# supports dataframe and series
import gseapy as gp
temp  = gp.ssgsea(data=ssdf2, gene_sets="./tests/data/temp.gmt")

5.2 Access Enrichment Score (ES) and NES

results are saved to obj.res2d

[37]:
# NES and ES
ss.res2d.sort_values('Name').head()
[37]:
Name Term ES NES Tag % Gene % Lead_genes
0 ALL_1 HALLMARK_MYC_TARGETS_V1 6338.956444 0.679758 147/200 31.53% RPS3;RPS6;RPS5;PABPC1;PGK1;RPL18;RACK1;RPL34;E...
1200 ALL_1 HALLMARK_P53_PATHWAY 2588.210197 0.277547 93/200 44.37% RPS12;RPL18;RACK1;TXNIP;LDHB;CD81;HIST1H1C;VAM...
1872 ALL_1 HALLMARK_COMPLEMENT 513.268027 0.05504 60/200 34.00% GNAI2;PSMB9;HSPA5;PIM1;PFN1;CALM1;WAS;TNFAIP3;...
1248 ALL_1 HALLMARK_INFLAMMATORY_RESPONSE 50.76128 0.005443 121/200 71.89% KLF6;NFKBIA;TAPBP;CD69;HIF1A;NMI;GNA15;BTG2;AB...
240 ALL_1 HALLMARK_NOTCH_SIGNALING 697.587326 0.074806 21/32 79.81% SKP1;KAT2A;JAG1;FOSL2;CUL1;FBXW11;PPARD;NME4;D...
[38]:
nes = ss.res2d.pivot(index='Term', columns='Name', values='NES')
nes.head()
[38]:
Name ALL_1 ALL_10 ALL_11 ALL_12 ALL_13 ALL_14 ALL_15 ALL_16 ALL_17 ALL_18 ... AML_22 AML_23 AML_24 AML_3 AML_4 AML_5 AML_6 AML_7 AML_8 AML_9
Term
HALLMARK_ADIPOGENESIS 0.331232 0.329075 0.330213 0.343893 0.362546 0.355699 0.334983 0.306043 0.371784 0.347331 ... 0.303826 0.292327 0.221319 0.354471 0.397949 0.2832 0.334483 0.458101 0.457889 0.39459
HALLMARK_ANDROGEN_RESPONSE 0.144927 0.136943 0.211038 0.233724 0.16325 0.159404 0.188494 0.138497 0.212663 0.209312 ... 0.181847 0.195235 0.11322 0.195834 0.174462 0.189198 0.16957 0.190042 0.229602 0.21032
HALLMARK_ANGIOGENESIS -0.110603 -0.169102 -0.16132 -0.074429 -0.189447 -0.078742 -0.073851 -0.129031 -0.062332 -0.125366 ... 0.01231 -0.044895 0.067698 -0.090903 0.042762 0.010191 0.060879 -0.040782 0.003717 -0.040145
HALLMARK_APICAL_JUNCTION 0.09589 0.11086 0.102473 0.072587 0.094981 0.091434 0.110588 0.091577 0.098605 0.116882 ... 0.134789 0.115729 0.14266 0.133337 0.164631 0.155894 0.17089 0.146885 0.131404 0.139515
HALLMARK_APICAL_SURFACE -0.052286 0.001903 -0.004585 0.043133 0.045221 0.085853 0.023781 0.050865 0.077711 -0.015706 ... 0.020596 -0.095771 0.108809 0.17544 0.050516 0.103885 0.001697 0.100576 0.106986 -0.034715

5 rows × 48 columns

[39]:
# txt, gct file input
# ss_permut = gp.ssgsea(data="./tests/extdata/Leukemia_hgu95av2.trim.txt",
#                gene_sets="./tests/extdata/h.all.v7.0.symbols.gmt",
#                outdir=None,
#                sample_norm_method='rank', # choose 'custom' for your own rank list
#                permutation_num=20, # skip permutation procedure, because you don't need it
#                no_plot=True, # skip plotting, because you don't need these figures
#                processes=4, seed=9)
# ss_permut.res2d.head(5)

3) command line usage of single sample gsea

[40]:
# !gseapy ssgsea -d ./data/testSet_rand1200.gct \
#                -g data/temp.gmt \
#                -o test/ssgsea_report2  \
#                -p 4 --no-plot

6. Replot Example

6.1 locate your directory

notes: replot module need to find edb folder to work properly. keep the file tree like this:

data
 |--- edb
 |    |--- C1OE.cls
 |    |--- gene_sets.gmt
 |    |--- gsea_data.gsea_data.rnk
 |    |--- results.edb
[41]:
# run command inside python console
rep = gp.replot(indir="./tests/data", outdir="test/replot_test")

6.2 command line usage of replot

[42]:
# !gseapy replot -i data -o test/replot_test