R package **mdendro**
enables the calculation of **agglomerative hierarchical
clustering** (AHC), extending the standard functionalities in
several ways:

Native handling of both

**similarity**and**dissimilarity**(distances) matrices.Calculation of pair-group dendrograms and variable-group

**multidendrograms**[1].Implementation of the most common AHC methods in both

**weighted**and**unweighted**forms: single linkage, complete linkage, average linkage (UPGMA and WPGMA), centroid (UPGMC and WPGMC), and Ward.Implementation of two additional parametric families of methods:

**versatile linkage**[2], and**beta flexible**. Versatile linkage leads naturally to the definition of two additional methods:*harmonic linkage*, and*geometric linkage*.Calculation of the

**cophenetic**(or**ultrametric**) matrix.Calculation of five

**descriptors**of the final dendrogram:*cophenetic correlation coefficient*,*space distortion ratio*,*agglomerative coefficient*,*chaining coefficient*, and*tree balance*.Plots of the descriptors for the parametric methods.

All this functionality is obtained with functions
`linkage`

, `descval`

and `descplot`

.
Function `linkage`

may be considered as a replacement for
functions `hclust`

(in package stats)
and `agnes`

(in package cluster).
To enhance usability and interoperability, the `linkage`

class includes several methods for plotting, summarizing information,
and class conversion.

There exist two main ways to install **mdendro**:

Installation from CRAN (recommended method):

`install.packages("mdendro")`

RStudio has a menu entry (Tools \(\rightarrow\) Install Packages) for this job.

Installation from GitHub (you may need to install first devtools):

`install.packages("devtools") library(devtools) install_github("sergio-gomez/mdendro")`

Since

**mdendro**includes C++ code, you may need to install first Rtools in Windows, or Xcode in MacOS.

Let us start by using the `linkage`

function to calculate
the complete linkage AHC of the `UScitiesD`

dataset, a matrix
of distances between a few US cities:

```
library(mdendro)
<- linkage(UScitiesD, method = "complete") lnk
```

Now we can plot the resulting dendrogram:

`plot(lnk)`

The summary of this dendrogram is:

`summary(lnk)`

```
## Call:
## linkage(prox = UScitiesD,
## type.prox = "distance",
## digits = 0,
## method = "complete",
## group = "variable")
##
## Number of objects: 10
##
## Binary dendrogram: TRUE
##
## Descriptive measures:
## cor sdr ac cc tb
## 0.8077859 1.0000000 0.7738478 0.3055556 0.9316262
```

In particular, you can recognize the calculated descriptors:

`cor`

: cophenetic correlation coefficient`sdr`

: space distortion ratio`ac`

: agglomerative coefficient`cc`

: chaining coefficient`tb`

: tree balance

It is possible to work with similarity data without having to convert them to distances, provided they are in range [0.0, 1.0]. A typical example would be a matrix of non-negative correlations:

```
<- as.dist(Harman23.cor$cov)
sim <- linkage(sim, type.prox = "sim")
lnk plot(lnk, main = "Harman23")
```

There is also the option to choose between unweighted (default) and weighted methods:

```
par(mfrow = c(1, 2))
<- round(dist(scale(mtcars)), digits = 3)
cars <- list(cex = 0, lab.cex = 0.4)
nodePar
<- linkage(cars, method = "arithmetic")
lnk1 plot(lnk1, main = "unweighted", nodePar = nodePar)
<- linkage(cars, method = "arithmetic", weighted = TRUE)
lnk2 plot(lnk2, main = "weighted", nodePar = nodePar)
```

When there are tied minimum distances in the agglomeration process,
you may ignore them and proceed choosing a random pair (pair-group
methods) or agglomerate them all at once (variable-group
multidendrograms). With `linkage`

you can use both
approaches, being multidendrograms the default one:

```
par(mfrow = c(1, 2))
<- round(dist(scale(mtcars)), digits = 1)
cars <- list(cex = 0, lab.cex = 0.4)
nodePar
<- linkage(cars, method = "complete")
lnk1 plot(lnk1, main = "multidendrogram", nodePar = nodePar)
<- linkage(cars, method = "complete", group = "pair")
lnk2 plot(lnk2, main = "pair-group", nodePar = nodePar)
```

While multidendrograms are unique, you may obtain structurally different pair-group dendrograms by just reordering the data. As a consequence, the descriptors are invariant to permutations for multidendrograms, but not for pair-group dendrograms:

```
<- round(dist(scale(mtcars)), digits = 1)
cars <- linkage(cars, method = "complete")
lnk1 <- linkage(cars, method = "complete", group = "pair")
lnk2
# apply random permutation to data
set.seed(666)
<- sample(attr(cars, "Size"))
ord <- as.dist(as.matrix(cars)[ord, ord])
cars_p <- linkage(cars_p, method = "complete")
lnk1p <- linkage(cars_p, method = "complete", group = "pair")
lnk2p
# compare original and permuted cophenetic correlation
c(lnk1$cor, lnk1p$cor)
## [1] 0.7782257 0.7782257
c(lnk2$cor, lnk2p$cor)
## [1] 0.7780010 0.7780994
# compare original and permuted tree balance
c(lnk1$tb, lnk1p$tb)
## [1] 0.9564568 0.9564568
c(lnk2$tb, lnk2p$tb)
## [1] 0.9472909 0.9424148
```

In multidendrograms, the ranges (rectangles) show the heterogeneity between distances within the group, but they are optional in the plots:

```
par(mfrow = c(1, 2))
<- round(dist(scale(mtcars)), digits = 1)
cars <- list(cex = 0, lab.cex = 0.4)
nodePar <- linkage(cars, method = "complete")
lnk plot(lnk, col.rng = "bisque", main = "with ranges", nodePar = nodePar)
plot(lnk, col.rng = NULL, main = "without ranges", nodePar = nodePar)
```

Plots including ranges are only available if you directly use the
`plot.linkage`

function from **mdendro**.
Anyway, you may still take advantage of other dendrogram plotting
packages, such as dendextend and
ape:

```
par(mfrow = c(1, 2))
<- round(dist(scale(mtcars)), digits = 1)
cars <- linkage(cars, method = "complete")
lnk
<- as.dendrogram(lnk)
lnk.dend plot(dendextend::set(lnk.dend, "branches_k_color", k = 4),
main = "dendextend package",
nodePar = list(cex = 0.4, lab.cex = 0.5))
<- as.hclust(lnk)
lnk.hcl <- c("red", "forestgreen", "purple", "orange")
pal4 <- cutree(lnk.hcl, 4)
clu4 plot(ape::as.phylo(lnk.hcl),
type = "fan",
main = "ape package",
tip.color = pal4[clu4],
cex = 0.5)
```