6. Frequently Asked Questions¶
6.1. Q: What kind of gene identifiers are supported in GSEApy?¶
A:
- If you select
Enrichr library
as your inputgene_sets
(gmt format), then gene symbols in upper cases are needed. - if you use your own
GMT
file, you need to use the same type of your gene identifiers inGMT
and input gene list.
6.2. Q: Why gene symbols in Enrichr library are all UPPER cases
for mouse, fly, fish, worm ?¶
A:: GSEApy can’t change the Enrichr databases. So convert your gene symbols into UPPER cases first, then run the analysis you want.
6.3. Q: Why P-value or FDR is 0
, not a very small number?¶
A: GSEA methodology use random permutation procedure (e.g. 1000 permutation) to obtain a null distribution. Then, an observed ES is compared to the 1000 shuffled ES to calculate a P-value. When observed ES is not within the null ESs, you’ll get 0s. if you don’t want 0, you could #. set the smallest pvalue to 1 / ( number of permutations) #. increase the permutation number (but more running time needed)
6.4. Q: What Enrichr
database
are supported?¶
A: Support modEnrich (https://amp.pharm.mssm.edu/modEnrichr/) . Now, Human, Mouse, Fly, Yeast, Worm, Fish are all supported.
6.5. Q: Use custom defined GMT
file input in Jupyter ?¶
A: argument gene_sets
accept dict
input. This is useful when define your own gene_sets. An example dict looks like this:
gene_sets = {
"term_1": ["gene_A", "gene_B", ...],
"term_2": ["gene_B", "gene_C", ...],
...
"term_100": ["gene_A", "gene_T", ...]
}
APIs support dict object input: gsea
, prerank
, ssgsea
, enrichr
6.6. Q: How to use Yeast
database in gseapy.enrichr()
?¶
Because some library names are the same in different Enrichr
database
, you have to set an additional augment organism
when no use Human
gss = gseapy.get_library_name(organism='Yeast')
enr = gseapy.enrichr(gene_list=...,
gene_sets=gss,
organism='Yeast', # don't forget to set organism="Yeast"
)
6.7. Q: How to use Yeast
database in gseapy.prerank()
?¶
There is no augment organism
in prerank
, gsea
, ssgea
, but you could input these Enrichr libraries as follow:
# get libraries you'd like to use
gss = gseapy.get_library_name(organism='Yeast')
# get a custom gmt_dict
gmt_dict = gseapy.parser.gsea_gmt_parser('GO_Biological_Process_2018', organism='Yeast')
# run
prn_res = gseapy.prerank( ..., gene_sets=gmt_dict, ...)
6.8. Q: How to save plots using gseaplot
, barplot
, dotplot
,``heatmap`` in Jupyter ?¶
A: e.g. gseaplot(…, ofname=’your.plot.pdf’). That’s it
6.9. Q: What cutoff
mean in functions, like enrichr()
, dotplot
, barplot
?¶
A: This argument control the terms (e.g FDR < 0.05) that will be shown on figures, not the result table output.
6.10. Q: ssGSEA missing p value and FDR?¶
A: The original ssGSEA alogrithm will not give you pval or FDR, so, please ignore the gseaplot generated by ssgsea
. It’s useless and misleading, therefore, fdr, and pval are not shown on the plot. If you’er seeking for ssGSEA with p-value output, please see here: https://github.com/broadinstitute/ssGSEA2.0
Actually, ssGSEA2.0 use the same method with GSEApy to calculate P-value, but FDR is not.
6.11. Q: What the difference between ssGSEA and Prerank¶
A: In short, - prerank is used for comparing two group of samples (e.g. control and treatment), where the gene ranking are defined by your custom rank method (like t-statistic, signal-to-noise, et.al). - ssGSEA is used for comparing individual samples to the rest of all, trying to find the gene signatures which samples shared the same (use ssGSEA when you have a lot of samples).
The statistic between prerank (GSEA) and ssGSEA are different. Assume that we have calculated each running enrichment score of your ranked input genes, then
- es for GSEA: max(running enrichment scores) or min(running enrichment scores)
- es for ssGSEA: sum(running enrichment scores)