This vignette gives an introduction to using **rbi**.
For the best viewing experience, use the version on
the rbi website.

rbi is an `R`

interface to LibBi, a library for
Bayesian inference with state-space models using high-performance
computer hardware.

The package has been tested on macOS and Linux. It requires a working
installation of **LibBi**. On macOS, this is easiest done
using the `brew`

command: Install Homebrew, then issue the following command
(using a command shell, i.e. Terminal or similar):

On linux, follow the instructions provided with LibBi.

If you have any trouble installing **LibBi** you can get
help on the LibBi
Users mailing list.

The path to the `libbi`

script can be passed as an
argument to **rbi**, otherwise the package tries to find it
automatically using the `which`

linux/unix command.

If you just want to process the output from **LibBi**,
then you do not need to have **LibBi** installed.

The easiest way to install the latest stable version of
**rbi** is via CRAN.

Alternatively, the current development version can be installed using
the `remotes`

package

The main computational engine and model grammar behind
**rbi** is provided by **LibBi**. The LibBi manual is a
good place to start for finding out everything there is to know about
**LibBi** models and inference methods.

The **rbi** package mainly provides two classes:
`bi_model`

and `libbi`

. The `bi_model`

class is used to load, view and manipulate **LibBi** model
files. The `libbi`

class is used to run LibBi and perform
inference.

The package also provides two methods for interacting with the NetCDF files
used by **LibBi**, `bi_read`

and
`bi_write`

. Lastly, it provides a `get_traces`

function to analyse Markov-chain Monte Carlo (MCMC) traces using the coda package.

`bi_model`

classAs an example, we consider a simplified version of the SIR model
discussed in Del Moral et
al. (2014). This is included with the **rbi** package
and can be loaded with

Other ways of implementing a (deterministic or stochastic) SIR model
can be found in the collection of SIR models
for LibBi, where you also find how to load them into a
`bi_model`

object, e.g. `sir_model`

. Feel free to
run the commands below with different versions of the model.

The `sir_model`

object now contains the model, which can
be displayed with

```
sir_model
#> bi_model:
#> =========
#> 1: model SIR {
#> 2: const h = 7
#> 3: const N = 1000
#> 4: const d_infection = 14
#> 5: noise n_transmission
#> 6: noise n_recovery
#> 7: state S
#> 8: state I
#> 9: state R
#> 10: state Z
#> 11: obs Incidence
#> 12: param p_rep
#> 13: param p_R0
#> 14: sub parameter {
#> 15: p_rep ~ uniform(0,1)
#> 16: p_R0 ~ uniform(1,3)
#> 17: }
#> 18: sub initial {
#> 19: S <- N - 1
#> 20: I <- 1
#> 21: R <- 0
#> 22: Z <- 1
#> 23: }
#> 24: sub transition {
#> 25: n_transmission ~ wiener()
#> 26: n_recovery ~ wiener()
#> 27: Z <- (t_now % 7 == 0 ? 0 : Z)
#> 28: inline i_beta = p_R0 / d_infection * exp(n_transmission)
#> 29: inline i_gamma = 1 / d_infection * exp(n_recovery)
#> 30: ode (alg='RK4(3)', h=1e-1, atoler=1e-2, rtoler=1e-5) {
#> 31: dS/dt = - i_beta * S * I / N
#> 32: dI/dt = i_beta * S * I / N - i_gamma * I
#> 33: dR/dt = i_gamma * I
#> 34: dZ/dt = i_beta * S * I / N
#> 35: }
#> 36: }
#> 37: sub observation {
#> 38: Incidence ~ poisson(p_rep * Z)
#> 39: }
#> 40: sub proposal_initial {
#> 41: S <- N - 1
#> 42: I <- 1
#> 43: R <- 0
#> 44: Z <- 1
#> 45: }
#> 46: sub proposal_parameter {
#> 47: p_rep ~ truncated_gaussian(mean = p_rep, std = 0.03, lower = 0, upper = 1)
#> 48: p_R0 ~ truncated_gaussian(mean = p_R0, std = 0.2, lower = 1, upper = 3)
#> 49: }
#> 50: }
```

A part of the model can be shown with, for example,

or, for example,

To get a list of certain variables, you can use the
`var_names`

function. For example, to get a list of states,
you can use

There are also various methods for manipulating a model, such as
`remove_lines`

, `insert_line`

,
`replace_all`

.

The `fix`

method fixes a variable to one value. This can
be useful, for example, to run the deterministic equivalent of a
stochastic model for testing purposes:

To get documentation for any of these methods, use the links in the
documentation for `bi_model`

.

First, let’s create a data set from the SIR model.

This simulates the model a single time from time 0 until time 16*7
(say, 16 weeks with a daily time step), producing 16 outputs (one a
week). Note that we have specified a random seed to make this document
reproducible. If you omit the `set.seed`

command or set it to
a different number, the results will be different even when run with the
same set of commands. Also note that *LibBi* compiles the model
code only the first time it is run. If you run the command above a
second time, it should run much faster.

The `generate_dataset`

function returns a
`libbi`

object:

The generated dataset can be viewed and/or stored in a variable using
`bi_read`

:

The `bi_read`

function takes the name of a NetCDF file or
a `libbi`

object (in which case it locates the output file)
and stores the contents in a list of data frames or vectors, depending
on the dimensionality of the contents. Note that, if no
`working_folder`

is specified, the model and output files
will be stored in a temporary folder.

We can visualise the generated incidence data with

`libbi`

classThe `libbi`

class manages the interaction with
**LibBi** such as sampling from the prior or posterior
distribution. For example, the `sir_data`

object above is of
type `libbi`

:

Th `bi_generate_dataset`

is one particular way of
generating a `libbi`

object, used only to generate test data
from a model. The standard way of creating a `libbi`

object
for Bayesian inference is using the `libbi`

command

This initialises a `libbi`

object with the model created
earlier and assigns it to the variable `bi`

.

Let’s sample from the prior of the SIR model:

This step calls **LibBi** to sample from the prior
distribution of the previously specified model, generating 1,000 samples
and each time running the model for 16 * 7 = 112 time steps and writing
16 outputs (i.e., every 7 time steps). **LibBi** parses the
model, creates C++ code, compiles it and run the model. If the model is
run again, it should do so much quicker because it will use the already
compiled C++ code to run the model:

The `sample`

command returns an updated `libbi`

object which, in this case, we just assign again to the `bi`

object. Any call of `sample`

preserves options passed to the
previous call of `sample`

and `libbi`

, unless they
are overwritten by arguments passed to `sample`

(e.g.,
passing a new `nsamples`

argument). Let’s have a closer look
at the `bi`

object:

To see even more detail, try

We can see the object contains 14 fields, including the model, the
path to the `libbi`

script, and the command used to run libbi
(`bi$command`

); the `options`

field contains all
the options that **LibBi** was called with. This includes
the ones we passed to `sample`

The other fields contain various bits of information about the
object, including the model used, the command used to run
**LibBi** (`bi$command`

) and the output file
name:

We can get the results of the sampling run using
`bi_read`

or with the shorthand

which looks at the `output_file_name`

field to read in the
data. Let’s look at the returned object

This is a list of 9 objects, 8 representing each of the (noise/state)
variables and parameters in the file, and one number `clock`

,
representing the time spent running the model in microseconds.

We can see that the time-varying variables are represented as data
frames with three columns: `np`

(enumerating individual
simulations), `time`

and `value`

. Parameters don’t
vary in time and just have `np`

and `value`

columns.

Let’s perform inference using Particle Markov-chain Metropolis
Hastings (PMMH). The following command will generate 16 * 10,000 =
160,000 simulations and therefore may take a little while to run (if you
want to see the samples progress, use `verbose=TRUE`

in the
`sample`

call).

This samples from the posterior distribution. Remember that options
are preserved from previous runs (because we passed the `bi`

)
as first argument, so we don’t need to specify `nsamples`

,
`end_time`

and `noutputs`

again, unless we want to
change them. The `nparticles`

option specifies the number of
particles.

You can also pass a list of data frames (each element of the list
corresponding to one observed variable as the `obs`

argument,
for example

```
df <- data.frame(
time = c(0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112),
value = c(1, 6, 2, 26, 99, 57, 78, 57, 15, 9, 4, 1, 1, 1, 0, 2, 0)
)
bi_df <- sample(
bi_prior, target = "posterior", nparticles = 32, obs = list(Incidence = df)
)
```

Input, init and observation files (see the LibBi manual for
details) can be specified using the `init`

,
`input`

, `obs`

options, respectively. They can
each be specified either as the name of a NetCDF file containing the
data, or a `libbi`

object (in which case the output file will
be taken) or directly via an appropriate `R`

object
containing the data (e.g., a character vector of length one, or a list
of data frames or numeric vectors). In the case of the command above,
`init`

is specified as a list, and `obs`

as a
`libbi`

object. The `Incidence`

variable of the
`sir_data`

object will be taken as observations.

The time dimension (or column, if a data frame) in the passed
`init`

, `input`

and/or `obs`

files can
be specified using the `time_dim`

option. If this is not
given, it will be assumed to be `time`

, if such a dimension
exists or, if not, any numeric column not called `value`

(or
the contents of the `value_column`

option). If this does not
produce a unique column name, an error will be thrown. All other
dimensions/columns in the passed options will be interpreted as
additional dimensions in the data, and stored in the `dims`

field of the `libbi`

object.

Any other options (apart from `log_file_name`

, see the Debugging section) will be passed on to the
command `libbi`

– for a complete list, see the LibBi manual. Hyphens
can be replaced by underscores so as not to confuse R (see
`end_time`

). Any arguments starting with
`enable`

/`disable`

can be specified as boolean
(e.g., `assert=TRUE`

). Any `dry-`

options can be
specified with a `"dry"`

argument, e.g.,
`parse="dry"`

.

Let’s get the results of the preceding `sample`

command:

We can see that this has two more objects than previously when we
specified `target="prior"`

: `loglikelihood`

(the
estimated log-likelihood of the parameters at each MCMC step) and
`logprior`

(the estimated log-prior density of the parameters
at each MCMC step).

To get a summary of the parameters sampled, use

A summary of sampled trajectories can be obtained using

Any particular posterior sample can be viewed with
`extract_sample`

(with indices running from 0 to
`nsamples - 1`

):

To analyse MCMC outputs, we can use the **coda** package
and the `get_traces`

function of **rbi**. Note
that, to get exactly the same traces, you would have to set the seed as
above.

We can, for example, visualise parameter traces and densities with

Compare this to the marginal posterior distributions to the “correct” parameters used to generate the data set:

For more details on using **coda** to further analyse
the chains, see the website of the coda package. For
more plotting functionality, the ggmcmc package is
also worth considering.

We can use the `predict`

function to re-simulate the
fitted model using the estimated parameters, that is to generate samples
from \(p(x_t|\theta)\) where the \(\theta\) are distributed according to the
marginal posterior distribution \(p(\theta|y^*_t)\) (here: \(\theta\) are fixed parameters, \(x_t\) are state trajectories and \(y^*_t\) observed data points, as in the LibBi manual). This
can be useful, for example, for comparing typical model trajectories to
the data, or for running the model beyond the last data point.

```
pred_bi <- predict(
bi, start_time = 0, end_time = 20 * 7, output_every = 7,
with = c("transform-obs-to-state")
)
```

where `with=c("transform-obs-to-state")`

tells LibBi to
treat observations as a state variable, that is to randomly generate
observations, i.e. samples from \(p(y_t|\theta)\) where, again, \(\theta\) are distributed according to the
posterior distribution \(p(\theta|y^*_t)\) (see the
`with-transform-obs-to-state`

option in the LibBi manual).

To sample observations from sampled posterior state trajectories, that is samples from \(p(y_t|x_t)\) where the \(x_t\) are distributed according to the posterior distribution \(p(x_t | y_t)\), you can use.

Compare this to the data:

The other so-called clients of LibBi (besides `sample`

)
are supported through commands of the same name: `filter, optimise and
rewrite. For example, to run a particle filter on the last posterior
sample generated above, you can use:

Output form LibBi runs can be visualised using standard R plotting
routines or plotting packages such as `ggplot2`

. The
`summary`

function can help with this. For example, to plot
observations randomly generated from the posterior distribution of the
parameters and compare them to the data we can use

```
ps <- summary(pred_bi, type = "obs")
library("ggplot2")
ggplot(ps, aes(x = time)) +
geom_line(aes(y = Median)) +
geom_ribbon(aes(ymin = `1st Qu.`, ymax = `3rd Qu.`), alpha = 0.5) +
geom_point(aes(y = value), dataset$Incidence, color = "darkred") +
ylab("cases")
```

where we have plotted the median fit as a black line, the interquartile range as a grey ribbon, and the data points as dark red dots. Compare this to observations randomly generated from the posterior distribution of trajectories:

`libbi`

objects**rbi** provides its own versions of the
`saveRDS`

and `readRDS`

functions called
`save_libbi`

and `read_libbi`

. These make sure
that all information (including any options, input, init and observation
files) is stored in the object.

`libbi`

objects from previous runsTo recreate a `libbi`

object from a previous R session,
use `attach_data`

. For example, one could use the following
code to get the acceptance rate for a *LibBi* run with a given
output and model file:

For a general check of model syntax, the `rewrite`

command
is useful:

This generates the internal representation of the model for LibBi. It
doesn’t matter so much what this looks like, but it will throw an error
if there is a problem. If `libbi`

throws an error, it is best
to investigate with `debug = TRUE`

, and setting
`working_folder`

to a folder that one can then use for
debugging. Output of the `libbi`

call can be saved in a file
using the `log_file_name`

option (by default a temporary
file).

- Murray, L.M. (2013). Bayesian state-space modelling on high-performance hardware using LibBi.