This vignette describes how to control for variables. This is a new
feature to **BGGM** (version `2.0.0`

).

When controlling for variables, a multivariate regression is fitted
in **BGGM**. In fact, a GGM can be understood as a
multivariate regression with intercepts only models for the
predictors.

**BGGM** does not use the typical approach for
multivariate regression in `R`

. This avoids having to write
out each outcome variable, of which there are typically many in a GGM.
In **BGGM**, it is assumed that the data matrix includes
only the variables to be included in the GGM and the control
variables.

Suppose that we want to control for education level, with five variables included in the GGM.

```
# data
Y <- bfi[,c(1:5, 27)]
# head
head(Y)
#> A1 A2 A3 A4 A5 education
#> 61617 2 4 3 4 4 NA
#> 61618 2 4 5 2 5 NA
#> 61620 5 4 5 4 4 NA
#> 61621 4 4 6 5 5 NA
#> 61622 2 3 3 4 5 NA
#> 61623 6 6 5 6 5 3
```

Notice that `Y`

includes **only** the five
variables and `education`

.

This model can then be fitted with

`fit <- explore(Y, formula = ~ as.factor(education))`

To show this is indeed a multivariate regression, here are the summarized regression coefficients for the first outcome.

```
summ_coef <- regression_summary(fit)
# outcome one
summ_coef$reg_summary[[1]]
#> Post.mean Post.sd Cred.lb Cred.ub
#> (Intercept) 0.256 0.095 0.072 0.442
#> as.factor(education)2 0.073 0.128 -0.177 0.323
#> as.factor(education)3 -0.202 0.104 -0.405 -0.001
#> as.factor(education)4 -0.462 0.119 -0.691 -0.233
#> as.factor(education)5 -0.578 0.117 -0.815 -0.346
```

And here are the coefficients from `lm`

(a univariate
regression for `A1`

)

```
round(
cbind(
# summary: coef and se
summary( lm(scale(A1, scale = F) ~ as.factor(education), data = Y))$coefficients[,1:2],
# confidence interval
confint( lm(scale(A1, scale = F) ~ as.factor(education), data = Y))
), 3)
#> Estimate Std. Error 2.5 % 97.5 %
#> (Intercept) 0.256 0.093 0.073 0.438
#> as.factor(education)2 0.072 0.125 -0.172 0.316
#> as.factor(education)3 -0.203 0.101 -0.401 -0.004
#> as.factor(education)4 -0.461 0.116 -0.690 -0.233
#> as.factor(education)5 -0.578 0.115 -0.804 -0.351
```

The estimate are very (very) similar.

Note that all the other functions work just the same. For example, the relations controlling for education are summarized with

```
summary(fit)
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Type: continuous
#> Analytic: FALSE
#> Formula: ~ as.factor(education)
#> Posterior Samples: 5000
#> Observations (n):
#> Nodes (p): 5
#> Relations: 10
#> ---
#> Call:
#> estimate(Y = Y, formula = ~as.factor(education))
#> ---
#> Estimates:
#> Relation Post.mean Post.sd Cred.lb Cred.ub
#> A1--A2 -0.239 0.020 -0.278 -0.200
#> A1--A3 -0.109 0.020 -0.150 -0.070
#> A2--A3 0.276 0.019 0.239 0.312
#> A1--A4 -0.013 0.021 -0.055 0.026
#> A2--A4 0.156 0.020 0.117 0.196
#> A3--A4 0.173 0.020 0.134 0.214
#> A1--A5 -0.010 0.020 -0.050 0.029
#> A2--A5 0.150 0.020 0.111 0.189
#> A3--A5 0.358 0.018 0.322 0.392
#> A4--A5 0.121 0.020 0.082 0.159
#> ---
```

Now if we wanted to control for education, but also had gender in
`Y`

, this would be incorrect

```
Y <- bfi[,c(1:5, 26:27)]
head(Y)
#> A1 A2 A3 A4 A5 gender education
#> 61617 2 4 3 4 4 1 NA
#> 61618 2 4 5 2 5 2 NA
#> 61620 5 4 5 4 4 2 NA
#> 61621 4 4 6 5 5 2 NA
#> 61622 2 3 3 4 5 1 NA
#> 61623 6 6 5 6 5 2 3
```

In this case, with
`estimate(Y, formula = as.factor(education))`

, the GGM would
also include `gender`

(six variables instead of the desired
5). This is because all variables not included in `formula`

are included in the GGM. This was adopted in **BGGM** to
save the user from having to write out each outcome.

This differs from `lm`

, where each outcome needs to be
written out, for example
`cbind(A1, A2, A3, A4, A4) ~ as.factor(education)`

. This is
quite cumbersome for a model that includes many nodes.

The above data is ordinal. In this case, it is possible to fit a
multivariate probit model. This is also the approach for binary data in
**BGGM**. This is implemented with

```
fit <- estimate(Y, formula = ~ as.factor(education),
type = "ordinal", iter = 1000)
```

Note that the multivariate probit models can also be summarized with
`regression_summary`

.

This final example fits a Gaussian copula graphical model that can be
used for mixed data. In this case, `formula`

is not used and
instead all of the variables are included in the GGM.

This model is estimated with

```
# data
Y <- na.omit(bfi[,c(1:5, 27)])
# fit type = "mixed"
fit <- estimate(Y, type = "mixed", iter = 1000)
# summary
summary(fit)
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Type: mixed
#> Analytic: FALSE
#> Formula:
#> Posterior Samples: 1000
#> Observations (n):
#> Nodes (p): 6
#> Relations: 15
#> ---
#> Call:
#> estimate(Y = Y, type = "mixed", iter = 1000)
#> ---
#> Estimates:
#> Relation Post.mean Post.sd Cred.lb Cred.ub
#> A1--A2 -0.217 0.048 -0.294 -0.114
#> A1--A3 -0.063 0.027 -0.113 -0.011
#> A2--A3 0.364 0.023 0.317 0.410
#> A1--A4 0.116 0.038 0.048 0.192
#> A2--A4 0.241 0.031 0.182 0.303
#> A3--A4 0.228 0.026 0.174 0.275
#> A1--A5 0.057 0.031 0.003 0.120
#> A2--A5 0.186 0.027 0.135 0.241
#> A3--A5 0.438 0.019 0.399 0.474
#> A4--A5 0.151 0.025 0.103 0.199
#> A1--education -0.016 0.069 -0.125 0.119
#> A2--education 0.063 0.049 -0.016 0.162
#> A3--education 0.049 0.025 0.002 0.099
#> A4--education 0.053 0.026 0.005 0.105
#> A5--education 0.072 0.024 0.024 0.120
#> ---
```

Here it is clear that education is included in the model, as the relations with the other nodes are included in the output.

The graph is selected with

`select(fit)`

It is possible to control for variable with all methods in
**BGGM**, including when comparing groups, Bayesian
hypothesis testing, etc.