This package considers the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. Data are assumed to follow a tensor normal distribution whose covariance has a Kronecker product structure. For estimation, this package provides an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, and attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. For inference, this package provides a large-scale multiple testing method for support recovery of sparse precision matrix. A FDR control procedure can be easily implmented into the inference. Test consistency and FDR convergence achieves with only two tensor samples.
A tensor T∈Rm1×m2×⋯×mK follows the tensor normal distribution with zero mean and covariance matrices Σ1,…,ΣK, denoted as T∼TN(0;Σ1,…,ΣK), if its probability density function is p(T|Σ1,…,ΣK)=(2π)−m/2{K∏k=1|Σk|−m/(2mk)}exp(−‖ where m = \prod_{k=1}^K m_k and \bf{\Sigma}^{-1/2} := \{\bf{\Sigma}_1^{-1/2},\ldots,\bf{\Sigma}_K^{-1/2}\}.
A standard approach to estimate precision matrix \bf{\Omega}_k^*, k=1,\ldots,K, is to use the maximum likelihood method. Up to a constant, the negative log-likelihood function of the tensor normal distribution is \ell(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{2}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \frac{1}{2}\sum_{k=1}^K \frac{m}{m_k} \log |\bf{\Omega}_k|, where \bf{S} := \frac{1}{n} \sum_{i=1}^n \textrm{vec}({\cal T}_i) \textrm{vec}({\cal T}_i)^{\top}. To encourage the sparsity of each precision matrix in the high-dimensional scenario, a penalized log-likelihood estimator is proposed which minimizes q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{m}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \sum_{k=1}^K \frac{1}{m_k} \log |\bf{\Omega}_k| + \sum_{k=1}^K P_{\lambda_k}(\bf{\Omega}_k), where P_{\lambda_k}(\bf{\Omega}_k) = \lambda_k \sum_{i\ne j} |[\bf{\Omega}_{k}]_{i,j}| is penalty function and \lambda_k is tuning parameter.
q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) is jointly non-convex with respect to \bf{\Omega}_1, \ldots, \bf{\Omega}_K. Nevertheless, q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) is bi-convex problem since q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) is convex in \bf{\Omega}_k when the rest K-1 precision matrices are fixed. According to its bi-convex property, this package solves this non-convex problem by alternatively update one precision matrix with other matrices fixed. Note that, for any k = 1,\ldots, K, minimizing q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) with respect to \bf{\Omega}_k while fixing the rest K-1 precision matrices is equivalent to minimizing L(\bf{\Omega}_k) := \frac{1}{m_k}\textrm{tr}(\bf{S}_k \bf{\Omega}_k) - \frac{1}{m_k} \log |\bf{\Omega}_k| + \lambda_k \|\bf{\Omega}_k\|_{1,\textrm{off}}. Here \bf{S}_k := \frac{m_k}{n m}\sum_{i=1}^n \bf{V}_i^k \bf{V}_i^{k\top}, where \bf{V}_i^k := \big[ {\cal T}_i \times \big\{\bf{\Omega}_1^{1/2},\ldots,\bf{\Omega}_{k-1}^{1/2}, 1_{m_k}, \bf{\Omega}_{k+1}^{1/2},\ldots,\bf{\Omega}_{K}^{1/2} \big\} \big]_{(k)} with \times the tensor product operation and [\cdot]_{(k)} the mode-k matricization operation defined in . Note that minimizing L(\bf{\Omega}_k) corresponds to estimating vector-valued Gaussian graphical model and can be solved efficiently via the glasso algorithm Friedman et al. (2008). See Lyu et al. (2019) for detailed algorithm, named Tlasso.
Multiple testing method is established to testing the entries of precision matrix for each way of the tensor, Lyu et al. (2019). We focus on \bf{\Omega}_1^*, and procedures on the rest K-1 precision matrices are symmetric. Hypothesis for \bf{\Omega}_1^* is, \forall \,1 \le i < j \le m_1, H_{0 1 , i j } : \; [{\bf{\Omega}}_1^*]_{i,j} =0 \quad\quad \textrm{vs} \quad\quad H_{11 , ij} : \; [\bf{\Omega}_1^*]_{i,j} \ne 0
Let index -i in k-th mode denote all elements of mode-k except the i-th one. From the distribution of {\cal T }_{: , i_2 , \ldots , i_K}, we have, for every i_k \in \{ 1 , \ldots , m_k\}, k \in \{ 1 , \ldots , K\}, {\cal T}_{i_1 , i_2 , \ldots , i_K} | {\cal T}_{-i_1 , i_2 , \ldots , i_K} \sim \textrm{N} ( - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} {\cal T}_{-i_1 , i_2 , \ldots , i_K} ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K[\bf{\Sigma}_k^*]_{i_k, i_k} ). An equivalent linear model is {\cal T }_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; - i_1, i_2 , \ldots , i_K}^\top \bf{\theta}_{i_1} + \xi_{l ; i_1, i_2 , \ldots , i_K}, \forall \, l \in \{ 1 , \ldots , n \} where \bf{\theta}_{i_1} = - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} \text{ , and }\, \xi_{l ;i_1, i_2 , \ldots , i_K} \sim \textrm{N} (0 \, ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K [\bf{\Sigma}_k^*]_{i_k, i_k}). Note that intercept term is eliminated since zero mean of {\cal T }_{: , i_2 , \ldots , i_K}.
Test statistic is constructed by both bias and variance correction of sample covariance of residuals. Let \hat{\bf{\theta}}_{i_1} = (\hat{\theta}_{1, i_1} , \ldots , \hat{\theta}_{m_1-1, i_1})^\top = - [\hat{\bf{\Omega}}_1]_{i_1,i_1}^{-1} [\hat{\bf{\Omega}}_1]_{i_1, - i_1} be the estimator of \bf{\theta}_{i_1}, where \hat{\bf{\Omega}}_1 is the output of Tlasso Algorithm. Given \{ \hat{\bf{\theta}}_{i_1} \}_{i_1 = 1 }^{m_1}, the residual of the linear model is defined as
\hat{\xi}_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ i_1, i_2 , \ldots , i_K} - ( {\cal T }_{l ; - i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ - i_1, i_2 , \ldots , i_K} )^\top \hat{\bf{\theta}}_{i_1},
where \bar{{\cal T }} = \frac{1}{n}\sum \limits_{l=1}^{n} {\cal T }_{l }. The sample covariance of residuals is
\hat{\varrho}_{i,j}= \frac{m_1}{(n-1) m } \sum \limits_{l=1}^{n} \sum \limits_{i_2=1}^{m_2} \cdots \sum \limits_{i_K=1}^{m_K} \hat{\xi}_{l ; i, i_2 , \ldots , i_K} \hat{\xi}_{l ; j, i_2 , \ldots , i_K} .
Test statistic is \tau_{i,j} = \frac{\hat{\varrho}_{i.j} + \mu_{i,j}}{\varpi}, \forall 1 \le i < j \le m_1. The bias correction term \mu_{i,j}=\hat{\varrho}_{i,i} \hat{\theta}_{i , j} + \hat{\varrho}_{j,j} \hat{\theta}_{j-1, i} translates the mean of \tau_{i,j} to zero under H_{01, ij}. The variance correction term
\varpi^2 = \frac{m \cdot \|\widehat{\bf{S}}_2\|_F^2 \cdots \|\widehat{\bf{S}}_K\|_F^2}{m_1 \cdot (\textrm{tr}(\widehat{\bf{S}}_2))^2 \cdots (\textrm{tr}(\widehat{\bf{S}}_K))^2},
plays a significant role in rescaling \tau_{i,j} into an asymptotic standard normal distribution, where $ k := {i=1}^n _i _i^{} $ is the estimate of \bf{\Sigma}_k with \widehat{\bf{V}}_i := \big[ {\cal T}_i \times \bigl\{\widehat{\bf{\Omega}}_1^{1/2},\ldots,\widehat{\bf{\Omega}}_{k-1}^{1/2}, 1_{m_k}, \widehat{\bf{\Omega}}_{k+1}^{1/2},\ldots,\widehat{\bf{\Omega}}_{K}^{1/2} \bigr\} \big]_{(k)} and \widehat{\bf{\Omega}}_{k} from Tlasso Algorithm, k \in \{2 , \ldots , K \}. Lyu et al. (2019) proves that, under certain conditions, \sqrt{\frac{(n-1) m }{ m_1 \hat{\varrho}_{i,i} \hat{\varrho}_{j,j} }} \tau_{i,j} \rightarrow \textrm{N} ( 0 ;1 ) in distribution, as nm/m_1 \rightarrow \infty.
Let \tilde{\tau}_{i,j}= \sqrt{(n-1) m/ (m_1\hat{\varrho}_{i,i} \hat{\varrho}_{j,j}) } \tau_{i,j}. Given the values of \tilde{\tau}_{ij} and thresholding level \varsigma, we define a test procedure \varphi_{\varsigma} (\tilde{\tau}_{i,j})= 1 \{ |\tilde{\tau}_{i,j}| \ge \varsigma \} and reject H_{01, ij} if \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1. The definition of FDR/FDP in our notation is \textrm{FDP} = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \, \text{ , and } \,\textrm{FDR} = \bf{E} (\textrm{FDP}). where {\cal H}_0 = \{ (i,j) : [\bf{\Omega}_1^*]_{i,j} = 0 , 1 \le i < j \le m_1 \}. To recover the support of \bf{\Omega}_1^*, (m_1-1)m_1/2 tests are simultaneously conducted. Hence, a small enough \varsigma that enhances power while controls FDP under a pre-specific level \upsilon \in (0,1) is an ideal choice. In particular, \varsigma_{*} = \inf \{ \varsigma > 0: \text{FDP} \le \upsilon \}. However, \varsigma_* is oracle since we have no access to the information of {\cal H}_0. Due to sparsity, w_0 =|{\cal H}_0| can be approximated by w= m_1(m_1 -1 )/2. P(\varphi_{\varsigma} (\tilde{\tau}_{i,j})=1) is close to 2(1 - \Phi( \varsigma)) by test consistency. The approximation of \varsigma_* is \hat{\varsigma}= \inf \bigg \{ \varsigma > 0: \frac{2(1-\Phi( \varsigma )) w}{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \le \upsilon \bigg \}.
FDR/FDP control procedure is to infer the support of \bf{\Omega}_1^* at thresholding level \hat{\varsigma}. In particular, FDR and FDP in the procedure are \textrm{FDP}_1 = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \text{ , and } \,\textrm{FDR}_1 = \bf{E} (\textrm{FDP}_1). Note that we set \tilde{\tau}_{i,j} = \tilde{\tau}_{j,i} for 1 \le i < j \le m_1. The inference of \textrm{supp}(\bf{\Omega}_k^*) , \, k \in \{2 , \ldots , K \}, is a symmetric procedure.
Lyu et al. (2019) gives the asymptotic control of \text{FDP}_1 and \text{FDR}_1 for the support of \bf{\Omega}_{1}^*: under certain conditions, \textrm{FDP}_1 w / \upsilon w_0 \rightarrow 1 , \; \textrm{FDR}_1 w / \upsilon w_0 \rightarrow 1 in probability as nm/m_1 \rightarrow \infty.
This packages contains following functions:
ChainOmega
NeighborOmega
knn
nearest-neighbor graph, this function first randomly picks p points from a unit square and computes all pairwise distances among the points. Then it searches for the knn nearest-neighbors of each point and a pair of symmetric entries in the precision matrix that has a random chosen value from [-1, -0.5] \cup [0.5, 1]. Finally, to ensure positive definite property, it normalizes the matrix as \Omega <- \Omega + (\lambda_{\min} (\Omega) + 0.2 ) 1_p where \lambda_{\min} (\cdot ) refers to the samllest eigenvalue.
Trnorm
Sigma.list
is not given, default distribution is from either triangle graph or nearest-neighbor graph (depends on type
).
Tlasso.fit
T
and thres
. Algorithm will be terminated if output in certain iteration change less than thres
. Otherwise, T iterations will be fully operated.
covres
biascor
) for inference is off-diagnoal. Elements in Omega.list are true precision matrices or estimation of the true ones, the latter can be output of Tlasso.fit
.
biascor
Omega.list
are true precision matrices or estimation of the true ones, the latter can be output of Tlasso.fit
.
varcor
est.analysis
infer.analysis
graph.pattern
The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.
First, we load Tlasso
package:
Then, we generate a list of precision matrices of triangle graph.
m.vec = c(5,5,5) # dimensionality of a tensor
# m1, m2, m3
n = 5 # sample size
Omega.true.list = list()
for (k in 1:length(m.vec)) {
Omega.true.list[[k]] = ChainOmega(m.vec[k], sd = k)
}
Omega.true.list[[1]]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.3168143 -0.2308893 0.0000000 0.0000000 0.0000000
## [2,] -0.2308893 0.4674875 -0.2123306 0.0000000 0.0000000
## [3,] 0.0000000 -0.2123306 0.4234692 -0.1841058 0.0000000
## [4,] 0.0000000 0.0000000 -0.1841058 0.3658496 -0.1499391
## [5,] 0.0000000 0.0000000 0.0000000 -0.1499391 0.2415995
ChainOmega
returns a precision matrix of triangle graph with dimension m.vec[k]
and seed number k
. Given precision matrices, we generate obervations from corresponding tensor normal distribution.
Sigma.true.list = list()
for (k in 1:length(m.vec)) {
Sigma.true.list[[k]] = solve(Omega.true.list[[k]])
} # generate covariance matrices list
DATA=Trnorm(n,m.vec,Sigma.list=Sigma.true.list)
# obersavations from tensor normal distribution
Trnorm
generates observations from separable tensor normal distribution with covariance matrix Sigma.list[[k]]
for kth mode. DATA
is a m_1 * m_2 * m_3 * n array, i.e, a 10 \times 10 \times 10 \times 10 array. Default distribution is from triangle graph or 4 near-neighbor graph (depends on type
).
DATA2=Trnorm(n,m.vec)
# default is triangle graph
# equivalent to DATA2 = Trnorm(n,m.vec, type='Chain', sd=1)
DATA3=Trnorm(n,m.vec,type='Neighbor')
# 4 nearest-neighbor graph
# equivalent to DATA3 = Trnorm(n,m.vec, type='Neighbor', sd=1, knn=4)
Given observations DATA
, we use Tlasso.fit
to conduct alternating optimization.
lambda.thm = 20*c( sqrt(log(m.vec[1])/(n*prod(m.vec))),
sqrt(log(m.vec[2])/(n*prod(m.vec))),
sqrt(log(m.vec[3])/(n*prod(m.vec))))
# lambda.thm is regularization parameter
out.tlasso = Tlasso.fit(DATA,T=1,lambda.vec = lambda.thm)
# output is a list of estimation of precision matrices
out.tlasso[[1]]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.361422637 -0.23704270 0.002333856 -0.06126152 0.01539202
## [2,] -0.237046526 0.46822624 -0.192384949 -0.05154091 0.05256162
## [3,] 0.002304876 -0.19236551 0.424993048 -0.17817458 -0.07814616
## [4,] -0.061257515 -0.05155747 -0.178173408 0.37108685 -0.08295747
## [5,] 0.015405071 0.05255995 -0.078142842 -0.08296053 0.19265707
Tlasso.fit
generates a list of precision matrices out.tlasso
. Default is maximal iteration T=1
and termination thresholding level thres=1e-05
. If estimation output changes less than thres
, in terms of Frobenius norm, after certain iteration, Tlasso.fit
will be terminated immediately (before Tth iteration). The performance of Tlasso.fit
can be evaluated by est.analysis
.
# compare out.tlasso and Omega.true.list
# main diagnoal is taken into consideration
est.analysis(out.tlasso,Omega.true.list,offdiag=FALSE)
## $error.kro
## [1] 0.3656762
##
## $tpr.kro
## [1] 1
##
## $tnr.kro
## [1] 0.4989574
##
## $av.error.f
## [1] 0.1986636
##
## $av.error.max
## [1] 0.08618703
##
## $av.tpr
## [1] 1
##
## $av.tnr
## [1] 0.3333333
##
## $error.f
## [1] 0.2130124 0.0980819 0.2848964
##
## $error.max
## [1] 0.07814616 0.05873962 0.12167531
##
## $tpr
## [1] 1 1 1
##
## $tnr
## [1] 0.0000000 0.3333333 0.6666667
est.analysis
returns a list of estimation performance measures:
Argument | Explanation |
---|---|
Out$error.kro |
error in Frobenius norm of kronecker product |
Out$tpr.kro |
TPR of kronecker product |
Out$tnr.kro |
TNR of kronecker product |
Out$av.error.f |
averaged Frobenius norm error across all modes |
Out$av.error.max |
averaged Max norm error across all modes |
Out$av.tpr |
averaged TPR across all modes |
Out$av.tnr |
averaged TNR across all modes |
Out$error.f |
vector; error in Frobenius norm of each mode |
Out$error.max |
vector; error in Max norm of each mode |
Out$tpr |
vector; TPR of each mode |
Out$tnr |
vector; TNR of each mode |
Given DATA
and out.tlasso
, we next show how to compute test statistic.
mat.list=list() # list of matrices of test statistic value
for ( k in 1:length(m.vec)) {
rho=covres(DATA, out.tlasso, k = k)
# sample covariance matrix of residuals, including diagnoal
bias_rho=biascor(rho,out.tlasso,k=k)
# bias corrected sample covariance of residuals, excluding diagnoal
varpi2=varcor(DATA, out.tlasso, k = k)
# variance correction term for kth mode's sample covariance of residuals
tautest=matrix(0,m.vec[k],m.vec[k])
for( i in 1:(m.vec[k]-1)) {
for ( j in (i+1):m.vec[k]){
tautest[j,i]=tautest[i,j]=sqrt((n-1)*prod(m.vec[-k]))*
bias_rho[i,j]/sqrt(varpi2*rho[i,i]*rho[j,j])
# compute final test statistic
}
}
mat.list[[k]]=tautest
}
mat.list[[1]]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.0000000 2.761395 -0.231802 1.509672 -0.4881653
## [2,] 2.7613945 0.000000 2.411092 0.797850 -1.2962198
## [3,] -0.2318020 2.411092 0.000000 2.153166 1.2980072
## [4,] 1.5096719 0.797850 2.153166 0.000000 2.3900777
## [5,] -0.4881653 -1.296220 1.298007 2.390078 0.0000000
To compute test statistic, we first need to compute sample covariance of residuals via rho
. rho
returns a sample covariance matrix, including diagnoal. Then we conduct bias correction biascor
and variance correction varcor
. biascor
returns a bias corrected sample covariance matrix. varcor
returns a scalar to scale test statistic into standard normal. Given test statistic value mat.list
, we turn to test hypothesis. The significant level we choose is 0.95.
# inference measures (off-diagnoal), critical value is 0.975 quantile of standard normal
infer.analysis(mat.list, qnorm(0.975), Omega.true.list, offdiag=TRUE)
## $fp
## [1] 0 0 2
##
## $fn
## [1] 0 0 4
##
## $d
## [1] 8 8 6
##
## $nd
## [1] 12 12 14
##
## $t
## [1] 8 8 8
infer.analysis
returns a list of inference performance measures:
Argument | Explanation |
---|---|
Out$fp |
vector; number of false positive of each mode |
Out$fn |
vector; number of false negative of each mode |
Out$d |
vector; number of all discovery of each mode |
Out$nd |
vector; number of all non-discovery of each mode |
Out$t |
vector; number of all true non-zero entries of each mode |
Due to the fact that this inference procedure relies on multiple testing, FDR control is indispensible. Lyu et al. (2019) provides an easy-implemented and efficient FDR control procedure. This procedure asymptotically controls FDR via selecting the smallest critical value that contrains FDP under certain level.
k=1 # interested mode
upsilon=0.1 # control level
# compute the difference between FDP and upsilon
fun=function(varsigma,mk,upsilon,tautest) {
return((2*(1-pnorm(varsigma))*mk*(mk-1))/max(1,sum(sign(abs(tautest) > varsigma))) - upsilon)
}
# select a critical value in (0,6) that has the samllest difference
diff=c();ind=1;inter=seq(0,6,0.0001)
for (varsigma in inter) {
diff[ind]=fun(varsigma,mk=m.vec[k],upsilon=upsilon,tautest=mat.list[[k]])
ind=ind+1
}
# the smallest critical value that constrains FDP under upsilon
critical=inter[min(which(diff < 0))]
# testing hypothesis with the critcal value
# FDR will converge to the limit proved in Lyu et al. 2019.
inference.FDR=infer.analysis(mat.list, critical, Omega.true.list, offdiag=TRUE)
Finally, we would like to visualize graph structure of inference via graph.pattern
.
k=1 # interested mode
# true graph structure.
# set thres=0 in case true edge is eliminated
graph.pattern(Omega.true.list[[1]],main='True graph of mode 1',thres=0)
inf.mat=mat.list[[k]] > qnorm(0.975)
# set thres=0 (<1) since inf.mat is logical
graph.pattern(inf.mat,main='Inference of mode 1',thres=0)
From graphs we can see that inference result quite matches true graph.