---
title: "Tutorial on optimal framed clustering"
author: "Tathagata Debnath and Joe Song"
date: "Updated: 2021-07-27; 2020-12-12; Created: 2020-12-03"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tutorial on optimal framed clustering}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: ../inst/REFERENCES.bib
---
  
The `FramedClust` function [@Debnath21] finds a frame of consecutive points belonging to a large set of univariate data points to obtain the best clustering. The `frame.size` parameter indicates the number of points to be included in each frame. The actual width of a frame is the distance between the smallest and largest data points inside the frame. 

Here we illustrate how to run the `FramedClust` function and visualize the results to explain what framed clustering is.

## Data preparation

Any linear dataset can be used with the `FramedClust` function. The input vector `X` contains some points from a linear data. `X` does not have to be sorted. We will find the best `K=7` clusters among all possible frames containing 50 points (`frame.size = 50`). 

```{r,  message=FALSE, warning=FALSE}
library(OptCirClust)
X = rgamma(70, 6)
K = 7
frame.size = 50
```


## Performing framed clustering on the data 

Now we demonstrate how to cluster the data using three different algorithms: the recommended fast and optimal `"linear.polylog"` algorithm, the brute-force and optimal algorithm by repeatedly calling `"Ckmeans.1d.dp"`, and the slow and heuristic algorithm by repeatedly calling `"kmeans"`.

```{r,  message=FALSE, warning=FALSE}
# Our recommended method is the fast and optimal linear.polylog:
result_linear.polylog <- FramedClust(X, K, frame.size,  method = "linear.polylog")

# The slow and optimal via repeatedly calling Ckmeans.1d.dp:
result_Ckmeans.1d.dp <- FramedClust(X, K, frame.size,  method = "Ckmeans.1d.dp")

# The slow and heuristic via repeatedly calling kmeans:
result_kmeans <- FramedClust(X, K, frame.size, method = "kmeans")
```


## Visualizing framed clusters

The clustering outcomes obtained from the `FramedClust` function can be visualized using the `plot` function.

```{r,  message=FALSE, warning=FALSE, fig.width = 5, fig.asp = .92}
plot(result_linear.polylog, main = "linear.polylog: optimal\n***Recommended***")

plot(result_Ckmeans.1d.dp, main = "Repeated Ckmeans.1d.dp: quadratic time\nalways optimal")

plot(result_kmeans, main = "Repeated kmeans: heuristic\nnot always optimal")
```


The points are colored according to their cluster. The black dashed line represents an optimal frame with the minimum sum of squared within-cluster distances. All points outside the frame are colored in gray. The `"kmeans"` option of the `FramedClust` function does not guarantee an optimal solution, while the other two method options do.

## References