Processing math: 100%

GREMLINS. Quick Mathematical Background

Sophie Donnet, Pierre Barbillon

2023-03-10

The goal of GREMLINS is to perform statistical analysis of multipartite networks through a block model approach.

Multipartite networks consist in the joint observation of several networks implying some common individuals. The individuals (or entities represented by nodes) at stake are partitioned into groups defined by their nature. In what follows, these groups will be referred to as functional groups.

library(GREMLINS)

Mathematical Background

The model is introduced and described in Bar-Hen, Barbillon, and Donnet (2018).

A collection of networks

Assume that Q functional groups of individuals are at stake. A multipartite network is a collection of networks, each of them implying one or two functional group. Thus, each network may be

     - *simple* if it represents the relations inside  a   functional   group  
     - *bipartite* if it represent the relations between individuals of two  functional groups. 
     

We index the collection of networks by pairs of functional groups (q,q). The set E denotes the list of pairs of functional groups for which we observe an interaction network.

For any pair (q,q)E, the interaction network is encoded in a matrix Xqq such that Xqqii0 if there is an edge from unit i of functional group q to unit i of functional group q, Xqqii=0 otherwise.

For any (q,q), Xqqii may be in {0,1} or a numeric for weigthed networks.

Note that, if qq, Xqq is said to be an incidence matrix (corresponding to a bipartite network). If q=q, Xqq is an adjacency matrix. Moreover, if the relation inside the functional group q is non-oriented, Xqq is symmetric.

A probabilistic latent variables model

Let nq be the number of individuals in the q-th functional group. Assume that, each functional group q is divided into Kq blocks or equivalently clusters. q and i, let Zqi be the latent random variable such that Zqi=k if individual i of functional group q belongs to cluster k. The random variables Zqi’s are assumed to be independent and such that: (i,k,q){1,,nq}×{1,,Kq}×{1,,Q}:

P(Zqi=k)=πqk, with Kqk=1πqk=1, q=1,,Q.

Conditionally on the clustering, the entries of the matrices (Xqqii) are assumed to be independent and distributed as follows: (i,i){1,,nq}×{1,,nq}, Xqqii|Zqi=k,Zqi=ki.i.dFqq(θqqkk) meaning that the probability of connection from i of functional group q to i of functional group q only depends on the clusters to which they belong to.

For any pair (q,q), Fqq() is either:

- Bernoulli, resulting into binary interactions

- Poisson for weighted networks of counts

- Gaussian or Laplace for continuous weighted networks. 

As a consequence, the collection of networks may contain weighted and/or binary networks.

Statistical inference

The inference of the model consists in the selection of the numbers of clusters (Kq)q=1,,Q and the estimation of the parameters (θqq). The model selection is performed with the ICL, a penalized likelihood criterion. The parameters are estimated with a varitional version of the EM algorithm. The estimation procedure also provides a clustering of the entities at stake.

References

Bar-Hen, Avner, Pierre Barbillon, and Sophie Donnet. 2018. Block models for multipartite networks.Applications in ecology and ethnobiology.” arXiv e-Prints, July, arXiv:1807.10138. https://arxiv.org/abs/1807.10138.