---
title: "Example 2: Using SEAGLE with Simulated Data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{example2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This tutorial demonstrates how to use the `SEAGLE` package when the user inputs a matrix ${\bf G}$.  We'll begin by loading the `SEAGLE` package.

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(SEAGLE)
```

As an example, we'll generate some synthetic data for usage in this tutorial.  Let's consider a dataset with $n=5000$ individuals and $L=100$ loci, where the first $40$ are causal.  

The `makeSimData` function generates a covariate matrix $\widetilde{\bf X} \in \mathbb{R}^{n \times 3}$, where the first column is the all ones vector for the intercept and the second and third columns are ${\bf X} \sim \text{N}(0,1)$ and ${\bf E}  \sim \text{N}(0,1)$, respectively.  The last two columns are scaled to have $0$ mean and unit variance.

The `makeSimData` function additionally generates the genetic marker matrix ${\bf G}$ with synthetic haplotype data from the COSI software.  Detailed procedures for generating ${\bf G}$ can be found in the accompanying journal manuscript.  Finally, the `makeSimData` function also generates a continuous phenotype ${\bf y}$ according to the following fixed effects model
$$
{\bf y} = \tilde{\bf X} \boldsymbol{\gamma}_{\widetilde{\bf X}} + {\bf G}\boldsymbol{\gamma}_{G} + \text{diag}(E){\bf G}\boldsymbol{\gamma}_{GE} + {\bf e}.
$$
Here, $\boldsymbol{\gamma}_{\tilde{\bf X}}$ is the all ones vector of length $P=3$, $\boldsymbol{\gamma}_{G} \in \mathbb{R}^{L}$, $\boldsymbol{\gamma}_{GE}\in \mathbb{R}^{L}$, and ${\bf e} \sim \text{N}({\bf 0}, \sigma\, {\bf I}_{n})$.  The entries of $\boldsymbol{\gamma}_{G}$ and $\boldsymbol{\gamma}_{GE}$ pertaining to causal loci are set to be $\gamma_{G}$ = `gammaG` and $\gamma_{GE}$ = `gammaGE`, respectively. The remaining entries of $\boldsymbol{\gamma}_{G}$ and $\boldsymbol{\gamma}_{GE}$ pertaining to non-causal loci are set to $0$.

```{r}
dat <- makeSimData(H=cosihap, n=5000, L=100, gammaG=1, gammaGE=0, causal=40, seed=1)

```

Now that we have our data, we can prepare it for use in the SEAGLE algorithm. We will input our ${\bf y}$, ${\bf X}$, ${\bf E}$, and ${\bf G}$ into the `prep.SEAGLE` function.  The `intercept = 1` parameter indicates that the first column of ${\bf X}$ is the all ones vector for the intercept.

This preparation procedure formats the input data for the `SEAGLE` function by checking the dimensions of the input data.  It also pre-computes a QR decomposition for $\widetilde{\bf X} = \begin{pmatrix} {\bf 1}_{n} & {\bf X} & {\bf E} \end{pmatrix}$, where ${\bf 1}_{n}$ denotes the all ones vector of length $n$.

```{r}
objSEAGLE <- prep.SEAGLE(y=dat$y, X=dat$X, intercept=1, E=dat$E, G=dat$G)
```

Finally, we'll input the prepared data into the `SEAGLE` function to compute the score-like test statistic $T$ and its corresponding p-value.  The `init.tau` and `init.sigma` parameters are the initial values for $\tau$ and $\sigma$ employed in the REML EM algorithm.

```{r}
res <- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5)
res$T
res$pv
```

The score-like test statistic $T$ for the G$\times$E effect and its corresponding p-value can be found in `res$T` and `res$pv`, respectively.