There are multiple ways to simulate MMRM datasets using
brms
and brms.mmrm
.
brm_simulate_simple()
simulates a dataset from the prior
predictive distribution of a simple special case of an MMRM.1
library(brms.mmrm)
set.seed(0)
sim <- brm_simulate_simple(
n_group = 3,
n_patient = 100,
n_time = 4
)
The data
element has a classed tibble
you
can directly supply to brm_formula()
and
brm_model()
.
sim$data
#> # A tibble: 1,200 × 4
#> response group time patient
#> <dbl> <chr> <chr> <chr>
#> 1 1.11 group_1 time_1 patient_001
#> 2 2.15 group_1 time_2 patient_001
#> 3 2.54 group_1 time_3 patient_001
#> 4 -1.73 group_1 time_4 patient_001
#> 5 1.11 group_1 time_1 patient_002
#> 6 2.64 group_1 time_2 patient_002
#> 7 1.69 group_1 time_3 patient_002
#> 8 0.783 group_1 time_4 patient_002
#> 9 0.118 group_1 time_1 patient_003
#> 10 2.48 group_1 time_2 patient_003
#> # ℹ 1,190 more rows
The parameters
element has the corresponding parameter
values simulated from the joint prior. Arguments to
brm_simulate_simple()
control hyperparameters.
str(sim$parameters)
#> List of 5
#> $ beta : num [1:6] 1.263 -0.326 1.33 1.272 0.415 ...
#> $ tau : num [1:4] -0.092857 -0.029472 -0.000577 0.240465
#> $ sigma : num [1:4] 0.911 0.971 0.999 1.272
#> $ lambda : num [1:4, 1:4] 1 0.415 -0.818 -0.282 0.415 ...
#> $ covariance: num [1:4, 1:4] 0.831 0.368 -0.745 -0.326 0.368 ...
And the model_matrix
element has the regression model
matrix of fixed effect parameters.
brm_data_change()
can convert the outcome variable from
raw response to change from baseline. This applies to real datasets
passed through [brm_data()] as well as simulated ones from
e.g. [brm_simulate_simple()]. The dataset above uses raw response with a
baseline time point of "time_1"
sim$data
#> # A tibble: 1,200 × 4
#> response group time patient
#> <dbl> <chr> <chr> <chr>
#> 1 1.11 group_1 time_1 patient_001
#> 2 2.15 group_1 time_2 patient_001
#> 3 2.54 group_1 time_3 patient_001
#> 4 -1.73 group_1 time_4 patient_001
#> 5 1.11 group_1 time_1 patient_002
#> 6 2.64 group_1 time_2 patient_002
#> 7 1.69 group_1 time_3 patient_002
#> 8 0.783 group_1 time_4 patient_002
#> 9 0.118 group_1 time_1 patient_003
#> 10 2.48 group_1 time_2 patient_003
#> # ℹ 1,190 more rows
brm_data_change()
subtracts baseline, replaces the raw
response column with a new change from baseline column, adds a new
column for the original baseline raw response, and adjusts the internal
attributes of the classed object accordingly.
brm_data_change(
data = sim$data,
name_change = "new_change",
name_baseline = "new_baseline"
)
#> # A tibble: 900 × 5
#> new_change new_baseline group time patient
#> <dbl> <dbl> <chr> <chr> <chr>
#> 1 1.04 1.11 group_1 time_2 patient_001
#> 2 1.43 1.11 group_1 time_3 patient_001
#> 3 -2.84 1.11 group_1 time_4 patient_001
#> 4 1.53 1.11 group_1 time_2 patient_002
#> 5 0.576 1.11 group_1 time_3 patient_002
#> 6 -0.328 1.11 group_1 time_4 patient_002
#> 7 2.37 0.118 group_1 time_2 patient_003
#> 8 3.07 0.118 group_1 time_3 patient_003
#> 9 -1.14 0.118 group_1 time_4 patient_003
#> 10 1.57 1.29 group_1 time_2 patient_004
#> # ℹ 890 more rows
For a more nuanced simulation, build up the dataset layer by layer.
Begin with brm_simulate_outline()
to create an initial
structure and a random missingness pattern. In
brm_simulate_outline()
, missing responses can come from
either transitory intercurrent events or from dropouts. The
missing
column indicates which outcome values will be
missing (NA_real_
) in a later step. The
response
column is entirely missing for now and will be
simulated later.
data <- brm_simulate_outline(
n_group = 2,
n_patient = 100,
n_time = 4,
rate_dropout = 0.3
)
data
#> # A tibble: 800 × 5
#> response missing group time patient
#> <dbl> <lgl> <chr> <chr> <chr>
#> 1 NA FALSE group_1 time_1 patient_001
#> 2 NA TRUE group_1 time_2 patient_001
#> 3 NA TRUE group_1 time_3 patient_001
#> 4 NA TRUE group_1 time_4 patient_001
#> 5 NA FALSE group_1 time_1 patient_002
#> 6 NA FALSE group_1 time_2 patient_002
#> 7 NA TRUE group_1 time_3 patient_002
#> 8 NA TRUE group_1 time_4 patient_002
#> 9 NA FALSE group_1 time_1 patient_003
#> 10 NA FALSE group_1 time_2 patient_003
#> # ℹ 790 more rows
Optionally add random continuous covariates
brm_simulate_continuous()
and random categorical covariates
using brm_simulate_categorical()
. In each case, the
covariates are non-time-varying, which means each patient gets only one
unique value.
data <- data |>
brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
brm_simulate_categorical(
names = c("status1", "status2"),
levels = c("present", "absent")
)
data
#> # A tibble: 800 × 9
#> response missing group time patient biomarker1 biomarker2 status1 status2
#> <dbl> <lgl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 NA FALSE group_1 time_1 patien… 0.328 -0.655 present absent
#> 2 NA TRUE group_1 time_2 patien… 0.328 -0.655 present absent
#> 3 NA TRUE group_1 time_3 patien… 0.328 -0.655 present absent
#> 4 NA TRUE group_1 time_4 patien… 0.328 -0.655 present absent
#> 5 NA FALSE group_1 time_1 patien… 1.04 -0.779 absent absent
#> 6 NA FALSE group_1 time_2 patien… 1.04 -0.779 absent absent
#> 7 NA TRUE group_1 time_3 patien… 1.04 -0.779 absent absent
#> 8 NA TRUE group_1 time_4 patien… 1.04 -0.779 absent absent
#> 9 NA FALSE group_1 time_1 patien… 0.717 -0.954 present absent
#> 10 NA FALSE group_1 time_2 patien… 0.717 -0.954 present absent
#> # ℹ 790 more rows
As described in the next section, brms.mmrm
has a
convenient function brm_simulate_prior()
to simulate the
outcome variable response
using the data skeleton above and
the prior predictive distribution. However, if you prefer a full custom
approach, you may need granular details about the parameterization,
which requires the model matrix. Fortunately, brms
supports
a make_standata()
function to provide this, given a dataset
and a formula. You may need to temporarily set the response variable to
something non-missing, and you may wish to specify a custom prior.
library(brms)
formula <- brm_formula(data = data)
#> Warning: Rows containing NAs were excluded from the model.
#> Error: All observations in the data were removed presumably because of NA values.
formula
#> response ~ group + group:time + time + unstr(time = time, gr = patient)
#> sigma ~ 0 + time
stan_data <- make_standata(
formula = formula,
data = mutate(data, response = 0)
)
model_matrix <- stan_data$X
head(model_matrix)
#> Intercept groupgroup_2 timetime_2 timetime_3 timetime_4
#> 1 1 0 0 0 0
#> 2 1 0 1 0 0
#> 3 1 0 0 1 0
#> 4 1 0 0 0 1
#> 5 1 0 0 0 0
#> 6 1 0 1 0 0
#> groupgroup_2:timetime_2 groupgroup_2:timetime_3 groupgroup_2:timetime_4
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> 4 0 0 0
#> 5 0 0 0
#> 6 0 0 0
Function brm_simulate_prior()
simulates from the prior
predictive distribution. It requires a dataset and a formula, and it
accepts a custom prior constructed with
brms::set_prior()
.
formula <- brm_formula(data = data)
#> Error: All observations in the data were removed presumably because of NA values.
library(brms)
prior <- set_prior("student_t(3, 0, 1.3)", class = "Intercept") +
set_prior("student_t(3, 0, 1.2)", class = "b") +
set_prior("student_t(3, 0, 1.1)", class = "b", dpar = "sigma") +
set_prior("lkj(1)", class = "cortime")
prior
#> prior class coef group resp dpar nlpar lb ub source
#> student_t(3, 0, 1.3) Intercept <NA> <NA> user
#> student_t(3, 0, 1.2) b <NA> <NA> user
#> student_t(3, 0, 1.1) b sigma <NA> <NA> user
#> lkj(1) cortime <NA> <NA> user
sim <- brm_simulate_prior(
data = data,
formula = formula,
prior = prior,
refresh = 0
)
The output object sim
has multiple draws from the prior
predictive distribution. sim$outcome
has outcome draws, and
sim$parameters
has parameter draws.
sim$model_matrix
has the model matrix, and
sim$model
has the full brms
model fit object.
You can pass sim$model
to functions from brms
and bayesplot
such as pp_check()
.
In addition, sim$data
has a copy of the original
dataset, but with the outcome variable taken from the final draw from
the prior predictive distribution. In addition, the missingness pattern
is automatically applied so that sim$data$response
is
NA_real_
whenever sim$data$missing
equals
TRUE
.
sim$data
#> # A tibble: 800 × 9
#> response missing group time patient biomarker1 biomarker2 status1 status2
#> <dbl> <lgl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 -0.0490 FALSE group_1 time_1 patien… 0.328 -0.655 present absent
#> 2 NA TRUE group_1 time_2 patien… 0.328 -0.655 present absent
#> 3 NA TRUE group_1 time_3 patien… 0.328 -0.655 present absent
#> 4 NA TRUE group_1 time_4 patien… 0.328 -0.655 present absent
#> 5 7.19 FALSE group_1 time_1 patien… 1.04 -0.779 absent absent
#> 6 3.18 FALSE group_1 time_2 patien… 1.04 -0.779 absent absent
#> 7 NA TRUE group_1 time_3 patien… 1.04 -0.779 absent absent
#> 8 NA TRUE group_1 time_4 patien… 1.04 -0.779 absent absent
#> 9 15.6 FALSE group_1 time_1 patien… 0.717 -0.954 present absent
#> 10 0.203 FALSE group_1 time_2 patien… 0.717 -0.954 present absent
#> # ℹ 790 more rows
brms
supports posterior predictive simulations and
checks through functions posteiror_predict()
,
posterior_epred()
, and pp_check()
. These can
be used with a brms
model fit object either from
brm_model()
or from brm_simulate_prior()
.
data <- sim$data
formula <- brm_formula(data = data)
model <- brm_model(data = data, formula = formula, refresh = 0)
outcome_draws <- posterior_predict(object = model)
The returned outcome_draws
object is a numeric array of
posterior predictive draws, with one row per draw and one column per
non-missing observation (row) in the original data.
The function help file explains the details about the model parameterization.↩︎