EnrichIntersect Tutorial

Zhi Zhao

This is a flexible tool for enrichment analysis based on user-defined sets. It allows users to perform over-representation analysis of the custom sets among any specified ranked feature list, hence making enrichment analysis applicable to various types of data from different scientific fields. EnrichIntersect also enables an interactive means to visualize identified associations based on, for example, the mix-lasso model (Zhao et al., 2022) or similar methods.

To get started, load the package with

library("EnrichIntersect")

Examples

Plot enrichment map

An example input data object cancers_drug_groups is an R list provided in our package, which includes a data.frame object with 147 cancer drugs as rows and nine cancer types as columns, and another data.frame that groups the 147 drugs (second column) into nine user-defined drug classes (first column). The default setup of enrichment() uses the classic K-S test statistic to calculate the normalized enrichment score that quantifies the degree to which the features in a feature set are over-represented at the top or bottom of the entire ranked list of features (e.g. a list of drugs), using as default 100 permutations for the empirical null test statistic. In the visualization, the statistically significantly enriched feature sets are marked with red coloured circles given a pre-specified significance level. The pre-specified significance level will be adjusted automatically if argument padj.method is one of c("holm","hochberg","hommel","bonferroni","BH","BY","fdr","none"). Users can specify the argument alpha for calculating a weighted enrichment score, argument normalize=FALSE for using the standard enrichment score rather than the normalized score, argument permute.n for the number of permutations of the ranked feature list used for estimating the empirical null test statistic, and the argument pvalue.cutoff for marking enriched categories at a specific significance level. See the following code for an example.

data(cancers_drug_groups, package = "EnrichIntersect")
x <- cancers_drug_groups$score
custom.set <- cancers_drug_groups$custom.set
set.seed(123)
enrich <- enrichment(x, custom.set, permute.n = 100)

Plot Sankey diagram for intersecting set through an array

EnrichIntersect function intersectSankey() creates a Sankey diagram to visualize intersecting sets from an array object, in which the first dimension represents intermediate variables, and the second and third dimensions represent multiple levels and multiple tasks, respectively. One intersecting set is a list of intermediate variables associated with a combination of a subset of levels and a subset of tasks, which is not easy to visualize when all possible combinations of the two are many. Our function intersectSankey() has adapted sankeyNetwork() from R package networkD3 to create a D3'JavaScript’ interactive Sankey diagram in order to be suitable for several levels, multiple tasks and many intermediate variables. Besides saving the Sankey diagram as an interactive html file, similarly to networkD3, the user can also save the Sankey diagram as a pdf or png file via R package webshot2. The argument out.fig=c(NA,"html","pdf","png") in the function intersectSankey() determines the figure on the user’s R graphics device, to be saved either as a html, pdf or png file.

An example input data object cancers_genes_drugs in the package is an array with associations between, e.g., 56 genes (first dimension), two cancer types (second dimension) and two drugs (third dimension) provided in our package. The user can adjust the Sankey diagram argument out.fig for different output graph types and use argument step.names to indicate the labels of the three kinds of variables in a Sankey diagram, i.e., name of multiple levels, name of intermediate variables, and name of multiple tasks, see the following code for an example.

data(cancers_genes_drugs, package = "EnrichIntersect")
# better to use argument `out.fig = "pdf"` for printing a pdf or html figure
intersectSankey(cancers_genes_drugs, step.names = c("Cancers", "Genes", "Drugs"))
Colorectal → HLTF
0 
Colorectal → HOXA9
0 
Colorectal → HOXA9
0 
Colorectal → PMEPA1
0 
Colorectal → MT1G
0 
Colorectal → SACS
0 
Colorectal → HS6ST2
0 
Colorectal → HS6ST2
0 
Colorectal → MLF1
0 
Colorectal → AKT3
0 
Colorectal → AKT3
0 
Colorectal → APC
0 
Colorectal → ARID1A
0 
Colorectal → ATRX
0 
Colorectal → ATRX
0 
Colorectal → BAP1
0 
Colorectal → BRAF
0 
Colorectal → BRCA2
0 
Colorectal → BRCA2
0 
Colorectal → BRIP1
0 
Colorectal → CARD11
0 
Colorectal → COL3A1
0 
Colorectal → CREBBP
0 
Colorectal → CXCR4
0 
Colorectal → EGFR
0 
Colorectal → EP300
0 
Colorectal → EP300
0 
Colorectal → ERCC5
0 
Colorectal → FGFR3
0 
Colorectal → FLNA
0 
Colorectal → FLNA
0 
Colorectal → FLT4
0 
Colorectal → GNAS
0 
Colorectal → HNF1A
0 
Colorectal → HNF1A
0 
Colorectal → MYH9
0 
Colorectal → NF1
0 
Colorectal → NF1
0 
Colorectal → NOTCH2
0 
Colorectal → NTRK1
0 
Colorectal → NTRK1
0 
Colorectal → PIK3CA
0 
Colorectal → PIK3CA
0 
Colorectal → PIK3R1
0 
Colorectal → PIK3R1
0 
Colorectal → SMAD3
0 
Colorectal → SMAD4
0 
Colorectal → SMAD4
0 
Colorectal → TRIP11
0 
Colorectal → TSC1
0 
Colorectal → TSC1
0 
Colorectal → TSC2
0 
Melanoma → CTHRC1
0 
Melanoma → ALK
0 
Melanoma → APC
0 
Melanoma → ATRX
0 
Melanoma → BRAF
0 
Melanoma → CASP8
0 
Melanoma → CREBBP
0 
Melanoma → CSF1R
0 
Melanoma → ELN
0 
Melanoma → FANCC
0 
Melanoma → FLI1
0 
Melanoma → MSH2
0 
Melanoma → NOTCH2
0 
Melanoma → PRDM16
0 
Melanoma → PRKAR1A
0 
Melanoma → RECQL4
0 
Melanoma → TGFBR2
0 
Melanoma → TRIP11
0 
Multiple myeloma → ARID1A
0 
Multiple myeloma → ATM
0 
Multiple myeloma → BTK
0 
Multiple myeloma → CREBBP
0 
Multiple myeloma → GNAS
0 
Multiple myeloma → KRAS
0 
Multiple myeloma → LMNA
0 
Multiple myeloma → MET
0 
Multiple myeloma → NF1
0 
Multiple myeloma → NRAS
0 
Multiple myeloma → RECQL4
0 
Multiple myeloma → RECQL4
0 
Multiple myeloma → TRIP11
0 
HLTF → RG-108
0 
HOXA9 → RG-108
0 
HOXA9 → JQ-1
0 
PMEPA1 → RG-108
0 
MT1G → JQ-1
0 
SACS → JQ-1
0 
CTHRC1 → RG-108
0 
HS6ST2 → RG-108
0 
HS6ST2 → JQ-1
0 
MLF1 → JQ-1
0 
AKT3 → RG-108
0 
AKT3 → JQ-1
0 
ALK → JQ-1
0 
APC → JQ-1
0 
ARID1A → JQ-1
0 
ATM → RG-108
0 
ATRX → RG-108
0 
ATRX → JQ-1
0 
BAP1 → RG-108
0 
BRAF → JQ-1
0 
BRCA2 → RG-108
0 
BRCA2 → JQ-1
0 
BRIP1 → JQ-1
0 
BTK → JQ-1
0 
CARD11 → RG-108
0 
CASP8 → JQ-1
0 
COL3A1 → RG-108
0 
CREBBP → JQ-1
0 
CSF1R → JQ-1
0 
CXCR4 → JQ-1
0 
EGFR → JQ-1
0 
ELN → RG-108
0 
EP300 → RG-108
0 
EP300 → JQ-1
0 
ERCC5 → JQ-1
0 
FANCC → JQ-1
0 
FGFR3 → RG-108
0 
FLI1 → JQ-1
0 
FLNA → RG-108
0 
FLNA → JQ-1
0 
FLT4 → RG-108
0 
GNAS → JQ-1
0 
HNF1A → RG-108
0 
HNF1A → JQ-1
0 
KRAS → JQ-1
0 
LMNA → RG-108
0 
MET → JQ-1
0 
MSH2 → JQ-1
0 
MYH9 → JQ-1
0 
NF1 → RG-108
0 
NF1 → JQ-1
0 
NOTCH2 → RG-108
0 
NOTCH2 → JQ-1
0 
NRAS → JQ-1
0 
NTRK1 → RG-108
0 
NTRK1 → JQ-1
0 
PIK3CA → RG-108
0 
PIK3CA → JQ-1
0 
PIK3R1 → RG-108
0 
PIK3R1 → JQ-1
0 
PRDM16 → RG-108
0 
PRKAR1A → RG-108
0 
RECQL4 → RG-108
0 
RECQL4 → JQ-1
0 
SMAD3 → RG-108
0 
SMAD4 → RG-108
0 
SMAD4 → JQ-1
0 
TGFBR2 → RG-108
0 
TRIP11 → RG-108
0 
TRIP11 → JQ-1
0 
TSC1 → RG-108
0 
TSC1 → JQ-1
0 
TSC2 → JQ-1
0 
Colorectal
5
Colorectal
Melanoma
2
Melanoma
Multiple myeloma
1
Multiple myeloma
HLTF
0
HLTF
HOXA9
0
HOXA9
PMEPA1
0
PMEPA1
MT1G
0
MT1G
SACS
0
SACS
CTHRC1
0
CTHRC1
HS6ST2
0
HS6ST2
MLF1
0
MLF1
AKT3
0
AKT3
ALK
0
ALK
APC
0
APC
ARID1A
0
ARID1A
ATM
0
ATM
ATRX
0
ATRX
BAP1
0
BAP1
BRAF
0
BRAF
BRCA2
0
BRCA2
BRIP1
0
BRIP1
BTK
0
BTK
CARD11
0
CARD11
CASP8
0
CASP8
COL3A1
0
COL3A1
CREBBP
0
CREBBP
CSF1R
0
CSF1R
CXCR4
0
CXCR4
EGFR
0
EGFR
ELN
0
ELN
EP300
0
EP300
ERCC5
0
ERCC5
FANCC
0
FANCC
FGFR3
0
FGFR3
FLI1
0
FLI1
FLNA
0
FLNA
FLT4
0
FLT4
GNAS
0
GNAS
HNF1A
0
HNF1A
KRAS
0
KRAS
LMNA
0
LMNA
MET
0
MET
MSH2
0
MSH2
MYH9
0
MYH9
NF1
0
NF1
NOTCH2
0
NOTCH2
NRAS
0
NRAS
NTRK1
0
NTRK1
PIK3CA
0
PIK3CA
PIK3R1
0
PIK3R1
PRDM16
0
PRDM16
PRKAR1A
0
PRKAR1A
RECQL4
0
RECQL4
SMAD3
0
SMAD3
SMAD4
0
SMAD4
TGFBR2
0
TGFBR2
TRIP11
0
TRIP11
TSC1
0
TSC1
TSC2
0
TSC2
RG-108
3
RG-108
JQ-1
4
JQ-1
CancersGenesDrugs

Citation

Zhi Zhao, Manuela Zucknick, Tero Aittokallio (2022). EnrichIntersect: an R package for custom set enrichment analysis and interactive visualization of intersecting sets. Bioinformatics Advances, 2(1), vbac073. DOI: 10.1093/bioadv/vbac073.

Zhi Zhao, Shixiong Wang, Manuela Zucknick, Tero Aittokallio (2022). Tissue-specific identification of multi-omics features for pan-cancer drug response prediction. iScience, 25(8): 104767. DOI: 10.1016/j.isci.2022.104767.