The ‘RcppML’ package provides high-performance machine learning algorithms using Rcpp with a focus on matrix factorization.

Install the latest development version of RcppML from github:

RcppML contains extremely fast NNLS solvers. Use the `nnls`

function to solve systems of equations subject to non-negativity constraints.

The `RcppML::solve`

function solves the equation for where is symmetric positive definite matrix of dimensions and is a vector of length or a matrix of dimensions .

```
# construct a system of equations
X <- matrix(rnorm(2000),100,20)
btrue <- runif(20)
y <- X %*% btrue + rnorm(100)
a <- crossprod(X)
b <- crossprod(X, y)
# solve the system of equations
x <- RcppML::nnls(a, b)
# use only coordinate descent
x <- RcppML::nnls(a, b, fast_nnls = FALSE, cd_maxit = 1000, cd_tol = 1e-8)
```

`RcppML::solve`

implements a new and fastest-in-class algorithm for non-negative least squares:

*initialization*is done by solving for the unconstrained least squares solution.*forward active set tuning*(FAST) provides a near-exact solution (often exact for well-conditioned systems) by setting all negative values in the unconstrained solution to zero, re-solving the system for only positive values, and repeating the process until the solution for values not constrained to zero is strictly positive. Set`cd_maxit = 0`

to use only the FAST solver.*Coordinate descent*refines the FAST solution and finds the best solution discoverable by gradient descent. The coordinate descent solution is only used if it gives a better error than the FAST solution. Generally, coordinate descent re-introduces variables constrained to zero by FAST back into the feasible set, but does not dramatically change the solution.

Project dense linear factor models onto real-valued sparse matrices (or any matrix coercible to `Matrix::dgCMatrix`

) using `RcppML::project`

.

`RcppML::project`

solves the equation for .

`RcppML::nmf`

finds a non-negative matrix factorization by alternating least squares (alternating projections of linear models and ).

There are several ways in which the NMF algorithm differs from other currently available methods:

- Diagonalized scaling of factors to sum to 1, permitting convex L1 regularization along the entire solution path
- Fast stopping criteria, based on correlation between models across consecutive iterations
- Extremely fast algorithms using the Eigen C++ library, optimized for matrices that are >90% sparse
- Support for NMF or unconstrained matrix factorization
- Parallelized using OpenMP multithreading

The following example runs rank-10 NMF on a random 1000 x 1000 matrix that is 90% sparse:

```
A <- rsparsematrix(100, 100, 0.1)
model <- RcppML::nmf(A, 10, verbose = F)
w <- model$w
d <- model$d
h <- model$h
model_tolerance <- tail(model$tol, 1)
```

Tolerance is simply a measure of the average correlation between \eqn{w_{i-1} and and and for a given iteration .

For symmetric factorizations (when ), tolerance becomes a measure of the correlation between and , and diagonalization is automatically performed to enforce symmetry:

Mean squared error of a factorization can be calculated for a given model using the `RcppML::mse`

function: