---
title: 'DR-SC: simulation'
author: "Wei Liu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{DR-SC Simulation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Generate the simulated data
First, we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform by using the function `gendata_RNAExp` in `DR.SC` package, which is a Seurat object format. It is noted that the `meta.data` must include spatial coordinates in columns named "row" (x coordinates) and "col" (y coordinates)!
```{r  eval = FALSE}
library(DR.SC)
seu <- gendata_RNAExp(height=30, width=30,p=500, K=4)
head(seu@meta.data)
```
## Fit DR-SC using simulated data

### Data preprocessing
This preprocessing includes Log-normalization and feature selection. Here we select highly variable genes for example first. The selected genes' names are saved in "seu@assays$RNA@var.features"
```{r  eval = FALSE}
### Given K
library(Seurat)
seu <- NormalizeData(seu)
# choose highly variable features using Seurat
seu <- FindVariableFeatures(seu, nfeatures = 400)
```

### Fit DR-SC based on highly variable genes(HVGs)
For function `DR.SC`, users can specify the number of clusters $K$ or set `K` to be an integer vector by using modified BIC(MBIC) to determine $K$. First, we try using user-specified number of clusters. Then we show the version chosen by MBIC.
```{r  eval = FALSE}
### Given K
seu2 <- DR.SC(seu, q=30, K=4, platform = 'ST',  verbose=F, approxPCA=T)
```

After finishing model fitting, we use ajusted rand index (`ARI`) to check the performance of clustering
```{r  eval = FALSE}
mclust::adjustedRandIndex(seu2$spatial.drsc.cluster, seu$true_clusters)
```
Next, we show the application of DR-SC in visualization. First, we can visualize the  clusters from DR-SC on the spatial coordinates.
```{r  eval = FALSE}
spatialPlotClusters(seu2)
```

We can also visualize the clusters from DR-SC on the two-dimensional tSNE based on the extracted features from DR-SC.
```{r  eval = FALSE}
drscPlot(seu2)
```
Show the UMAP plot based on the extracted features from DR-SC.
```{r  eval = FALSE}
drscPlot(seu2, visu.method = 'UMAP')
```

Use MBIC to choose number of clusters:
```{r  eval = FALSE}
seu2 <- DR.SC(seu, q=10, K=2:6, platform = 'ST', verbose=F,approxPCA=T)
mbicPlot(seu2)
```




### Fit DR-SC based on spatially variable genes(SVGs)
First, we select the spatilly variable genes using funciton `FindSVGs`.
```{r  eval = FALSE}
### Given K
seu <- NormalizeData(seu, verbose=F)
# choose 400 spatially variable features using FindSVGs
seus <- FindSVGs(seu, nfeatures = 400, verbose = F)
seu2 <- DR.SC(seus, q=4, K=4, platform = 'ST', verbose=F)
```

Using ARI to check the performance of clustering
```{r  eval = FALSE}
mclust::adjustedRandIndex(seu2$spatial.drsc.cluster, seu$true_clusters)
```
### DR-SC can enhance visualization
Show the spatial scatter plot for clusters
```{r  eval = FALSE}
spatialPlotClusters(seu2)
```

Show the tSNE plot based on the extracted features from DR-SC.
```{r  eval = FALSE}
drscPlot(seu2)
```
Show the UMAP plot based on the extracted features from DR-SC.
```{r  eval = FALSE}
drscPlot(seu2, visu.method = 'UMAP')
```

### DR-SC can automatically determine the number of clusters
Use MBIC to choose number of clusters:
```{r  eval = FALSE}
seu2 <- DR.SC(seus, q=4, K=2:6, platform = 'ST',  verbose=F)
mbicPlot(seu2)
# or plot BIC or AIC
# mbicPlot(seu2, criteria = 'BIC')
# mbicPlot(seu2, criteria = 'AIC')
# tune pen.const
seu2 <- selectModel(seu2, pen.const = 0.7)
mbicPlot(seu2)
```




## DR-SC can help differentially expression analysis 
Conduct visualization of marker gene expression.
### Ridge plots 
Visualize single cell expression distributions in each cluster from Seruat.
```{r  eval = FALSE}
dat <- FindAllMarkers(seu2)
suppressPackageStartupMessages(library(dplyr) )
# Find the top 1 marker genes, user can change n to access more marker genes
dat %>%group_by(cluster) %>%
    top_n(n = 1, wt = avg_log2FC) -> top
genes <- top$gene
RidgePlot(seu2, features = genes, ncol = 2)
```
### Violin plot
Visualize single cell expression distributions in each cluster
```{r  eval = FALSE}
VlnPlot(seu2, features = genes, ncol=2)
```
### Feature plot
We extract tSNE based on the features from DR-SC and then visualize feature expression in the low-dimensional space
 
```{r  eval = FALSE}
seu2 <- RunTSNE(seu2, reduction="dr-sc", reduction.key='drsctSNE_')
FeaturePlot(seu2, features = genes, reduction = 'tsne' ,ncol=2)

```


### Dot plots 
The size of the dot corresponds to the percentage of cells expressing the
feature in each cluster. The color represents the average expression level
```{r  eval = FALSE}
DotPlot(seu2, features = genes)
```

### Heatmap plot
Single cell heatmap of feature expression
```{r  eval = FALSE}
# standard scaling (no regression)
dat %>%group_by(cluster) %>%
    top_n(n = 30, wt = avg_log2FC) -> top
### select the marker genes that are also the variable genes.

genes <- intersect(top$gene, seu2[['RNA']]@var.features)
## Change the HVGs to SVGs
#  <- topSVGs(seu2, 400)
seu2 <- ScaleData(seu2, verbose = F)
DoHeatmap(subset(seu2, downsample = 500),features = genes, size = 5)
```



## Session information
```{r  eval = FALSE}
sessionInfo()
```