---
title: "Multi model usage"
output:
rmarkdown::html_vignette:
vignette: >
%\VignetteIndexEntry{Multi model usage}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: bayesnec.bib
---
[e1]: https://open-aims.github.io/bayesnec/articles/example1.html
[e2]: https://open-aims.github.io/bayesnec/articles/example2.html
[e2b]: https://open-aims.github.io/bayesnec/articles/example2b.html
[e3]: https://open-aims.github.io/bayesnec/articles/example3.html
[e4]: https://open-aims.github.io/bayesnec/articles/example4.html
# `bayesnec`
The background of `bayesnec` is covered in the [Single model usage][e1] vignette. Here we explain multi model usage using `bayesnec`. In `bayesnec` it is possible to fit a custom model set, specific model set, or all of the available models. When multiple models are specified the `bnec` function returns a model weighted estimate of predicted posterior values, based on the `"pseudobma"` using Bayesian bootstrap through `loo_model_weights` [@vehtari2020; @vehtari2017]. These are reasonably analogous to the way model weights are generated using AIC or AICc [@Burnham2002].
It is also possible to obtain all individual model fits from the fitted `bayesnecfit` model object if required using the `pull_out` function, and also to update an existing model fit with additional models, or to drop models using the function `amend`.
Multi-model inference can be useful where there are a range of plausible models that could be used [@Burnham2002] and has been recently adopted in ecotoxicology for Species Sensitivity Distribution (SSD) model inference [@Thorley2018]. The approach may have considerable value in concentration-response modelling because there is often no *a priori* knowledge of the functional form that the response relationship should take. In this case model averaging can be a useful way of allowing the data to drive the model selection process, with weights proportional to how well the individual models fits the data. Well-fitting models will have high weights, dominating the model averaged outcome. Conversely, poorly fitting models will have very low model weights and will therefore have little influence on the outcome. Where multiple models fit the data equally well, these can equally influence the outcome, and the resultant posterior predictions reflect that model uncertainty. It is possible to specify the `"stacking"` method [@Yao2018] for model weights if desired (through the argument `loo_controls`) which aims to minimise prediction error. We do not currently recommend using stacking weights given the typical sample sizes associated with most concentration—response experiments, and because the main motivation for model averaging within the `bayesnec` package is to properly capture model uncertainty rather than reduce prediction error.
# Examples
## Fitting multiple models and model averaging using the `bnec` function
### Fitting a `bnec` model
So far we have explored how to fit individual models via the function `bnec`. The `bayesnec` package also has the capacity to fit a custom selection of models, pre-specified sets of models, or even all the available models in the package. Note that as these are Bayesian methods requiring multiple Hamiltonian Monte Carlo (HMC) chains, using `bnec` can be very slow when specifying `models = "all"` within a `bayesnecformula`. See details under `?bnec` for more information on the models, and model sets that can be specified, as well as the [Model details][e2b] vignette which contains considerable information on the available models in `bnec` and their appropriate usage. In general it is safe to call `models = "all"`, because by default `bnec` will discard invalid models and the model averaging approach should result in an overall fit that reflects the models that best fit the data. However, because the HMC can be slow for `models = "all"` we do recommend testing your fit using a single (likely) model in the first instance, to make sure there are no issues with dispersion, the appropriate distribution is selected and model fitting appears robust (see the [Single model usage vignette][e1] for more details).
To run this vignette, we will need the `ggplot2` package:
```r
library(ggplot2)
```
```r
library(bayesnec)
# function which generates an "ecx4param" model
make_ecx_data <- function(top, bot, ec50, beta, x) {
top + (bot - top) / (1 + exp((ec50 - x) * exp(beta)))
}
x <- seq(0, 10, length = 12)
y <- make_ecx_data(x = x, bot = 0, top = 1, beta = 0.5, ec50 = 5)
set.seed(333)
df_ <- data.frame(x = rep(x, 15), y = rnorm(15 * length(x), y, 0.2))
exp_5 <- bnec(y ~ crf(x, model = "decline"), data = df_, iter = 2e3,
open_progress = FALSE)
```
Here we run `bnec` using `model = "decline"` using a simulated data example for a Beta-distributed response variable. We are using the `"decline"` set here because we are not going to consider hormesis (these allow an initial increase in the response), largely to save time in fitting this example. Moreover, you might want to consider saving the output as an `.RData` file---doing so can be a useful way of fitting large model sets (ie `model = "all"`, or `model = "decline"`) at a convenient time (this can be very slow, and may be run overnight for example) so you can reload them later to explore, plot, extract values, and amend the model set as required.
Whenever the `model` argument of `crf` is a model set, or a concatenation of specific models, `bnec` will return an object of class `bayesmanecfit`.
### Exploring a `bayesmanecfit` model
We have created some plotting method functions for both the `bayesnecfit` and `bayesmanecfit` model fits, so we can plot a `bayesmanecfit` model object simply with `autoplot`.
```r
autoplot(exp_5)
```
The default plot looks exactly the same as our regular `bayesnecfit` plot, but the output is based on a weighted average of all the model fits in the `model = "decline"` model set that were able to fit successfully using the default `bnec` settings. Because the model set contains a mix of *NEC* and *ECx* models, no-effect toxicity estimate reported by default on this plot (and in the summary output below) is reported as a N(S)EC (see @fisher2023ieam).
The fitted `bayesmanecfit` object contains different elements to the `bayesnecfit`. In particular, `mod_stats` contains the table of model fit statistics for all the fitted models. This includes the model name, the WAIC (as returned from `brms`), wi (the model weight, currently defaulting to `"pseudobma"` using Bayesian bootstrap from `loo`), and the dispersion estimates (only reported if response is modelled with a Poisson or Binomial distribution, otherwise NA).
```r
exp_5$mod_stats
#> model waic wi dispersion_Estimate dispersion_Q2.5 dispersion_Q97.5
#> nec4param nec4param -84.71235 0.3266459430 NA NA NA
#> neclin neclin -52.72181 0.0004911001 NA NA NA
#> ecxlin ecxlin -27.81568 0.0006015808 NA NA NA
#> ecx4param ecx4param -81.67824 0.0972614372 NA NA NA
#> ecxwb1 ecxwb1 -75.70936 0.0183140713 NA NA NA
#> ecxwb2 ecxwb2 -84.77620 0.2454512947 NA NA NA
#> ecxll5 ecxll5 -84.61205 0.2204366725 NA NA NA
#> ecxll4 ecxll4 -81.55602 0.0907979003 NA NA NA
```
We can obtain a neater summary of the model fit by using the `summary` method for a `bayesmanecfit` object. A list of fitted models, and model weights are provided. In addition, the model averaged no-effect estimate is reported, in this case as an N(S)EC. For this example the **nec4param** and **ecxwb2** models have the highest weights.
All these model fits are satisfactory despite the relatively low number of iterations set in our example, but the summary would also include a warning if there were fits with divergent transitions.
```r
summary(exp_5)
#> Object of class bayesmanecfit
#>
#> Family: gaussian
#> Links: mu = identity; sigma = identity
#>
#> Number of posterior draws per model: 1600
#>
#> Model weights (Method: pseudobma_bb_weights):
#> waic wi
#> nec4param -84.71 0.33
#> neclin -52.72 0.00
#> ecxlin -27.82 0.00
#> ecx4param -81.68 0.10
#> ecxwb1 -75.71 0.02
#> ecxwb2 -84.78 0.25
#> ecxll5 -84.61 0.22
#> ecxll4 -81.56 0.09
#>
#>
#> Summary of weighted N(S)EC posterior estimates:
#> NB: Model set contains a combination of ECx and NEC
#> models, and is therefore a model averaged
#> combination of NEC and NSEC estimates.
#> Estimate Q2.5 Q97.5
#> N(S)EC 3.37 2.11 3.98
#>
#>
#> Bayesian R2 estimates:
#> Estimate Est.Error Q2.5 Q97.5
#> nec4param 0.86 0.01 0.84 0.87
#> neclin 0.83 0.01 0.80 0.84
#> ecxlin 0.80 0.01 0.78 0.82
#> ecx4param 0.85 0.01 0.83 0.87
#> ecxwb1 0.85 0.01 0.83 0.86
#> ecxwb2 0.86 0.01 0.84 0.87
#> ecxll5 0.86 0.01 0.84 0.87
#> ecxll4 0.85 0.01 0.83 0.87
```
The `bayesmanecfit` object also contains all of the individual `brms` model fits, which can be extracted using the `pull_out` function. For example, we can pull out and plot the model **ecx4param**.
```r
exp_5_nec4param <- pull_out(exp_5, model = "ecx4param")
autoplot(exp_5_nec4param)
```
This would extract the **nec4param** model from the `bayesmanecfit` and create a new object of class `bayesnecfit` which contains just a single fit. This would be identical to fitting the **ecx4param** as a single model using `bnec`. All of the models in the `bayesmanecfit` can be simultaneously plotted using the argument `all_models = TRUE`.
```r
autoplot(exp_5, all_models = TRUE)
```
You can see that some of these models represent very bad fits, and accordingly have extremely low model weights, such as the **ecxlin** and **neclin** models in this example. There is no harm in leaving in poor models with low weight, precisely because they have such a low model weight and therefore will not influence posterior predictions. However, it is important to assess the adequacy of model fits of all models, because a poor fit may be more to do with a model that has not converged.
We can assess the chains for one of the higher weighted models to make sure they show good convergence. It is good practice to do this for all models with a high weight, as these are the models most influencing the multi-model inference.
```r
plot(pull_brmsfit(exp_5, "ecxwb1"))
```
Assessing chains for all the models in `bayesmanecfit` does not work as well using the default `brms` plotting method. Instead use `check_chains` and make sure to pass a `filename` argument, which means plots are automatically saved to pdf with a message.
```r
check_chains(exp_5, filename = "example_5_all_chains")
```
We can also make a plot to compare the posterior probability density to that of the prior using the `check_priors` function, for an individual model fit, but also saving all fits to a file in the working directory.
```r
check_priors(pull_out(exp_5, "nec4param"))
```
```r
check_priors(exp_5, filename = "example_5_all_priors")
```
Where a large number of models are failing to converge, obviously it would be better to adjust `iter` and `warmup` in the `bnec` call, as well as some of the other arguments to `brms` such as `adapt_delta`. See the `?brm` documentation for more details. It is possible to investigate if models from a `bayesmanecfit` class achieved convergence according to Rhat:
```r
rhat(exp_5) |>
sapply("[[", "failed")
#> nec4param neclin ecxlin ecx4param ecxwb1 ecxwb2 ecxll5 ecxll4
#> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
```
Here we get a message because none of our models failed the default Rhat criterion. A more conservative cut off of 1.01 can also be used by changing the default argument to the desired value. In this case one model fails, although we note that this is a very stringent criterion, and we have also used less than the default `bayesnec` value of `iter` (10,000).
```r
rhat(exp_5, rhat_cutoff = 1.01) |>
sapply("[[", "failed")
#> nec4param neclin ecxlin ecx4param ecxwb1 ecxwb2 ecxll5 ecxll4
#> FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
```
### Extracting toxicity values
The models prefixed with **ecx** are all models that do not have the *NEC* as a parameter in the model. That is, they are smooth curves as a function of concentration and have no breakpoint (no true No-Effect-Concentration, NEC). Thus the reported no-effect toxicity estimate is the NSEC [@fisherfox2023] (see the [Model details][e2b] vignette for more details). If a true model averaged estimate of *NEC* based only on threshold models is desired, this can be obtained with `model = "nec"`. We can use the helper functions `pull_out` and `amend` to alter the model set as required. `pull_out` has a `model` argument and can be used to pull a single model out (as above) or to pull out a specific set of models.
We can use this to obtain first a set of *NEC* only models from the existing set.
```r
exp_5_nec <- pull_out(exp_5, model = "nec")
```
In this case, because we have already fitted `"decline"` models, we can ignore the message regarding the missing *NEC* models---these are all models that are not appropriate for a `Beta` family with a `logit` link function, or allow hormesis, which we did not consider in this example.
Now we have two model sets, an *NEC* set, and a mixed *NEC* and *ECx* set. Of course, before we use this model set for any inference, we would need to check the chain mixing and `acf` plot for each of the input models. For the "all" set, the model with the highest weight is **nec4param**.
Now we can use the `ecx` function to get *EC~10~* and *EC~50~* values. We can do this using our all model set, because it is valid to use *NEC* models for estimating *ECx* (see more information in the [Model details][e2b] vignette).
```r
ECx10 <- ecx(exp_5, ecx_val = 10)
ECx50 <- ecx(exp_5, ecx_val = 50)
ECx10
#> Q50 Q2.5 Q97.5
#> 3.641915 2.699609 4.177541
#> attr(,"resolution")
#> [1] 1000
#> attr(,"ecx_val")
#> [1] 10
#> attr(,"toxicity_estimate")
#> [1] "ecx"
ECx50
#> Q50 Q2.5 Q97.5
#> 5.100009 4.842115 5.357903
#> attr(,"resolution")
#> [1] 1000
#> attr(,"ecx_val")
#> [1] 50
#> attr(,"toxicity_estimate")
#> [1] "ecx"
```
The weighted *NEC* estimates can be extracted directly from the *NEC* model set object, as they are an explicit parameter in these models.
```r
NECvals <- exp_5_nec$w_ne
NECvals
#> Estimate Q2.5 Q97.5
#> 3.528531 3.129615 4.031603
```
Note that the new *NEC* estimates from the **nec**-containing model fits are slightly higher than those reported in the summary output of all the fitted models. This can happen for smooth curves, which is what was used as the underlying data generation model in the simulations here, and is explored in more detail in the [Compare posteriors vigenette][e4].
### Putting it all together
Now we can make a combined plot of our output, showing the model averaged *NEC* model and the "all averaged model", along with the relevant thresholds.
```r
preds <- exp_5_nec$w_pred_vals$data
autoplot(exp_5, nec = FALSE, all = FALSE) +
geom_vline(mapping = aes(xintercept = ECx10, colour = "ECx 10"),
linetype = c(1, 3, 3), key_glyph = "path") +
geom_vline(mapping = aes(xintercept = ECx50, colour = "ECx 50"),
linetype = c(1, 3, 3), key_glyph = "path") +
geom_vline(mapping = aes(xintercept = NECvals, colour = "NEC"),
linetype = c(1, 3, 3), key_glyph = "path") +
scale_color_manual(values = c("ECx 10" = "orange", "ECx 50" = "blue",
"NEC" = "darkgrey"), name = "") +
geom_line(data = preds, mapping = aes(x = x, y = Estimate),
colour = "tomato", linetype = 2) +
geom_line(data = preds, mapping = aes(x = x, y = Q2.5),
colour = "tomato", linetype = 2) +
geom_line(data = preds, mapping = aes(x = x, y = Q97.5),
colour = "tomato", linetype = 2) +
theme(legend.position = c(0.8, 0.8),
axis.title = element_text(size = 16),
axis.text = element_text(size = 12),
strip.text.x = element_text(size = 16))
```
# References