# The ‘OptCirClust’ R package

### Overview

The package provides fast, optimal, and reproducible clustering for
circular, periodic, or framed data. The performance is guaranteed by
embedding divide-and-conquer, dynamic programming, and another
divide-and-conquer. The runtime of the algorithms is \(O(K N \log^2 N)\), where \(K\) is the number of clusters and \(N\) is the number of circular data
points.

The core algorithm solves the framed clustering problem by minimizing
the within-cluster sum of squared distances. In contrast to heuristic
clustering, the efficiency and accuracy of fast optimal framed
clustering is evident both theoretically and practically. The fast
algorithm use divide-and-conquer inside dynamic programming inside
another divide-and-conquer, guaranteeing cluster optimality—the total of
within-cluster sums of squared distances is always the minimum given the
number of clusters. Two plot functions visualize circular and framed
clusters to reveal patterns in data. On a desktop computer using a
single processor core, millions of data points can be clustered within
seconds. The algorithms offer a general high-performance solution to
circular, periodic, or framed data clustering.

### Main methods

The function to perform fast optimal circular data clustering is
`CirClust()`

. The function to perform the fast framed
clustering is `FramedClust()`

. Both functions with default
`method`

have a runtime of \(O(K N
\log^2N)\), linear polylogarithmic in sample size \(N\) and linear in the number of clusters
\(K\). The space complexity of both
functions is \(O(KN)\). Their
implementations scale large input data to improve numerical
precision.

### When to use the package

One can use the algorithms to cluster circular, periodic, or framed
data. Here circular data broadly refer to data points on any
non-self-intersecting loop. Periodic data clustering can be formulated
as circular clustering. Framed clustering finds one frame of fixed size
that contains best clusters. One can use them to characterize events
along circular DNA molecules, circular RNA molecules, and circular
genomes of bacteria, chloroplast, and mitochondria. One can also cluster
climate data along any given longitude or latitude.

### To download and install the
package

`install.packages("OptCirClust")`