---
title: "RaMS and friends"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{RaMS-and-friends}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
options(rmarkdown.html_vignette.check_title = FALSE)
data.table::setDTthreads(2)
```

**Table of contents:**

* [Standard export to CSV]
* [Fancier export to Excel]
* [Exporting to SQL database]
* [Interfacing with Python via `reticulate`]


The strength of `RaMS` is its simple data format. Table-like data structures are
common in most programming languages, and they can always be converted to the
nigh-universal matrix format. The goal of this vignette is to illustrate this
strength by exporting MS data to several formats that can be used outside of R.

## Standard export to CSV

As with all rectangular data, `RaMS` objects can be easily exported to CSV files
with base R functions. This works best with a few chromatograms at a time, as
the millions of data points found in most MS files can overwhelm common file
readers.

```{r}
library(RaMS)

# Locate an MS file
single_file <- system.file("extdata", "LB12HL_AB.mzML.gz", package = "RaMS")

# Grab the MS data
msdata <- grabMSdata(single_file, grab_what = "everything")

# Write out MS1 data to .csv file
write.csv(x = msdata$MS1, file = "MS1_data.csv")

# Clean up afterward
file.remove("MS1_data.csv")
```

## Fancier export to Excel

Excel workbooks are a common format because of their intuitive GUI and
widespread adoption. They can also encode more information than CSV files
due to their multiple "sheets" within a single workbook - perfect for encoding
both MS1 and MS2 information in one place. This vignette uses the 
[`openxlsx`](https://cran.r-project.org/package=openxlsx)
package, although there are several alternatives with identical functionality.

```{r}
library(openxlsx)

# Locate an MS2 file
MS2_file <- system.file("extdata", "S30657.mzML.gz", package = "RaMS")

# Grab the MS1 and MS2 data
msdata <- grabMSdata(MS2_file, grab_what=c("MS1", "MS2"))

# Write out MS data to Excel file
# openxlsx writes each object in a list to a unique sheet
# Produces one sheet for MS1 and one for MS2
write.xlsx(msdata, file = "MS2_data.xlsx")

# Clean up afterward
file.remove("MS2_data.xlsx")
```

## Exporting to SQL database

For more robust data processing and storage, or to work with larger-than-memory
data sets, SQL databases are an excellent choice. This vignette will demo the
[`RSQLite`](https://cran.r-project.org/package=RSQLite) 
package's engine, although several other database engines have similar
functionality.

```{r}
library(DBI)
# Get data from multiple files to show off
mzml_files <- system.file(c("extdata/LB12HL_AB.mzML.gz", 
                            "extdata/LB12HL_CD.mzML.gz"), 
                          package = "RaMS")
msdata <- grabMSdata(mzml_files)

# Create the sqlite database and connect to it
MSdb <- dbConnect(RSQLite::SQLite(), "MSdata.sqlite")

# Export MS1 and MS2 data to sqlite tables
dbWriteTable(MSdb, "MS1", msdata$MS1)
dbWriteTable(MSdb, "MS2", msdata$MS2)
dbListTables(MSdb)

# Perform a simple query to ensure data was exported correctly
dbGetQuery(MSdb, 'SELECT * FROM MS1 LIMIT 3')

# Perform EIC extraction in SQL rather than in R
EIC_query <- 'SELECT * FROM MS1 WHERE mz BETWEEN :lower_bound AND :upper_bound'
query_params <- list(lower_bound=118.086, upper_bound=118.087)
EIC <- dbGetQuery(MSdb, EIC_query, params = query_params)

# Append with additional files
extra_file <- system.file("extdata", "LB12HL_EF.mzML.gz", package = "RaMS")
extra_msdata <- grabMSdata(extra_file, grab_what = "everything")
dbGetQuery(MSdb, 'SELECT COUNT(*) FROM MS1') # Initial number of rows
dbAppendTable(MSdb, "MS1", extra_msdata$MS1)
# Confirm three different files exist in DB
dbGetQuery(MSdb, 'SELECT DISTINCT filename FROM MS1')
# Confirm new rows have been added
dbGetQuery(MSdb, 'SELECT COUNT(*) FROM MS1')

# Disconnect after export
dbDisconnect(MSdb)

# Clean up afterward
unlink("MSdata.sqlite")
```

## Interfacing with Python via `reticulate`

R and Python are commonly used together, and the 
[`reticulate`](https://rstudio.github.io/reticulate/) package makes this
even easier by enabling a Python interpreter within R. RStudio, in which
this vignette was written, supports both R and Python code chunks as shown
below.

### R code chunk: {r}

```{r}
# Locate a couple MS files
data_dir <- system.file("extdata", package = "RaMS")
file_paths <- list.files(data_dir, pattern = "HL.*mzML", full.names = TRUE)

msdata <- grabMSdata(files = file_paths, grab_what = "BPC")$BPC
```

### Python code chunk: {python}

```{python, fig.height=3, eval=FALSE}
# Not run to pass R CMD check on GitHub
# Make sure python, matplotlib, and seaborn are installed

import seaborn as sns
import matplotlib.pyplot as plt

sns.relplot(data=r.msdata, kind="line", x="rt", y="int", hue="filename")
plt.show()
```