Simulation

There are multiple ways to simulate MMRM datasets using brms and brms.mmrm.

1 Simple

brm_simulate_simple() simulates a dataset from the prior predictive distribution of a simple special case of an MMRM.1

library(brms.mmrm)
set.seed(0)
sim <- brm_simulate_simple(
  n_group = 3,
  n_patient = 100,
  n_time = 4
)

The data element has a classed tibble you can directly supply to brm_formula() and brm_model().

sim$data
#> # A tibble: 1,200 × 4
#>    response group   time   patient    
#>       <dbl> <chr>   <chr>  <chr>      
#>  1    1.11  group_1 time_1 patient_001
#>  2    2.15  group_1 time_2 patient_001
#>  3    2.54  group_1 time_3 patient_001
#>  4   -1.73  group_1 time_4 patient_001
#>  5    1.11  group_1 time_1 patient_002
#>  6    2.64  group_1 time_2 patient_002
#>  7    1.69  group_1 time_3 patient_002
#>  8    0.783 group_1 time_4 patient_002
#>  9    0.118 group_1 time_1 patient_003
#> 10    2.48  group_1 time_2 patient_003
#> # ℹ 1,190 more rows

The parameters element has the corresponding parameter values simulated from the joint prior. Arguments to brm_simulate_simple() control hyperparameters.

str(sim$parameters)
#> List of 5
#>  $ beta      : num [1:6] 1.263 -0.326 1.33 1.272 0.415 ...
#>  $ tau       : num [1:4] -0.092857 -0.029472 -0.000577 0.240465
#>  $ sigma     : num [1:4] 0.911 0.971 0.999 1.272
#>  $ lambda    : num [1:4, 1:4] 1 0.415 -0.818 -0.282 0.415 ...
#>  $ covariance: num [1:4, 1:4] 0.831 0.368 -0.745 -0.326 0.368 ...

And the model_matrix element has the regression model matrix of fixed effect parameters.

head(sim$model_matrix)
#>   groupgroup_1 groupgroup_2 groupgroup_3 timetime_2 timetime_3 timetime_4
#> 1            1            0            0          0          0          0
#> 2            1            0            0          1          0          0
#> 3            1            0            0          0          1          0
#> 4            1            0            0          0          0          1
#> 5            1            0            0          0          0          0
#> 6            1            0            0          1          0          0

2 Change from baseline

brm_data_change() can convert the outcome variable from raw response to change from baseline. This applies to real datasets passed through [brm_data()] as well as simulated ones from e.g. [brm_simulate_simple()]. The dataset above uses raw response with a baseline time point of "time_1"

sim$data
#> # A tibble: 1,200 × 4
#>    response group   time   patient    
#>       <dbl> <chr>   <chr>  <chr>      
#>  1    1.11  group_1 time_1 patient_001
#>  2    2.15  group_1 time_2 patient_001
#>  3    2.54  group_1 time_3 patient_001
#>  4   -1.73  group_1 time_4 patient_001
#>  5    1.11  group_1 time_1 patient_002
#>  6    2.64  group_1 time_2 patient_002
#>  7    1.69  group_1 time_3 patient_002
#>  8    0.783 group_1 time_4 patient_002
#>  9    0.118 group_1 time_1 patient_003
#> 10    2.48  group_1 time_2 patient_003
#> # ℹ 1,190 more rows

brm_data_change() subtracts baseline, replaces the raw response column with a new change from baseline column, adds a new column for the original baseline raw response, and adjusts the internal attributes of the classed object accordingly.

brm_data_change(
  data = sim$data,
  name_change = "new_change",
  name_baseline = "new_baseline"
)
#> # A tibble: 900 × 5
#>    new_change new_baseline group   time   patient    
#>         <dbl>        <dbl> <chr>   <chr>  <chr>      
#>  1      1.04         1.11  group_1 time_2 patient_001
#>  2      1.43         1.11  group_1 time_3 patient_001
#>  3     -2.84         1.11  group_1 time_4 patient_001
#>  4      1.53         1.11  group_1 time_2 patient_002
#>  5      0.576        1.11  group_1 time_3 patient_002
#>  6     -0.328        1.11  group_1 time_4 patient_002
#>  7      2.37         0.118 group_1 time_2 patient_003
#>  8      3.07         0.118 group_1 time_3 patient_003
#>  9     -1.14         0.118 group_1 time_4 patient_003
#> 10      1.57         1.29  group_1 time_2 patient_004
#> # ℹ 890 more rows

3 Advanced

For a more nuanced simulation, build up the dataset layer by layer. Begin with brm_simulate_outline() to create an initial structure and a random missingness pattern. In brm_simulate_outline(), missing responses can come from either transitory intercurrent events or from dropouts. The missing column indicates which outcome values will be missing (NA_real_) in a later step. The response column is entirely missing for now and will be simulated later.

data <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0.3
)

data
#> # A tibble: 800 × 5
#>    response missing group   time   patient    
#>       <dbl> <lgl>   <chr>   <chr>  <chr>      
#>  1       NA FALSE   group_1 time_1 patient_001
#>  2       NA TRUE    group_1 time_2 patient_001
#>  3       NA TRUE    group_1 time_3 patient_001
#>  4       NA TRUE    group_1 time_4 patient_001
#>  5       NA FALSE   group_1 time_1 patient_002
#>  6       NA FALSE   group_1 time_2 patient_002
#>  7       NA TRUE    group_1 time_3 patient_002
#>  8       NA TRUE    group_1 time_4 patient_002
#>  9       NA FALSE   group_1 time_1 patient_003
#> 10       NA FALSE   group_1 time_2 patient_003
#> # ℹ 790 more rows

Optionally add random continuous covariates brm_simulate_continuous() and random categorical covariates using brm_simulate_categorical(). In each case, the covariates are non-time-varying, which means each patient gets only one unique value.

data <- data |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  )

data
#> # A tibble: 800 × 9
#>    response missing group   time   patient biomarker1 biomarker2 status1 status2
#>       <dbl> <lgl>   <chr>   <chr>  <chr>        <dbl>      <dbl> <chr>   <chr>  
#>  1       NA FALSE   group_1 time_1 patien…      0.328     -0.655 present absent 
#>  2       NA TRUE    group_1 time_2 patien…      0.328     -0.655 present absent 
#>  3       NA TRUE    group_1 time_3 patien…      0.328     -0.655 present absent 
#>  4       NA TRUE    group_1 time_4 patien…      0.328     -0.655 present absent 
#>  5       NA FALSE   group_1 time_1 patien…      1.04      -0.779 absent  absent 
#>  6       NA FALSE   group_1 time_2 patien…      1.04      -0.779 absent  absent 
#>  7       NA TRUE    group_1 time_3 patien…      1.04      -0.779 absent  absent 
#>  8       NA TRUE    group_1 time_4 patien…      1.04      -0.779 absent  absent 
#>  9       NA FALSE   group_1 time_1 patien…      0.717     -0.954 present absent 
#> 10       NA FALSE   group_1 time_2 patien…      0.717     -0.954 present absent 
#> # ℹ 790 more rows

As described in the next section, brms.mmrm has a convenient function brm_simulate_prior() to simulate the outcome variable response using the data skeleton above and the prior predictive distribution. However, if you prefer a full custom approach, you may need granular details about the parameterization, which requires the model matrix. Fortunately, brms supports a make_standata() function to provide this, given a dataset and a formula. You may need to temporarily set the response variable to something non-missing, and you may wish to specify a custom prior.

library(brms)
formula <- brm_formula(data = data)
#> Warning: Rows containing NAs were excluded from the model.
#> Error: All observations in the data were removed presumably because of NA values.
formula
#> response ~ group + group:time + time + unstr(time = time, gr = patient) 
#> sigma ~ 0 + time

stan_data <- make_standata(
  formula = formula,
  data = mutate(data, response = 0)
)
model_matrix <- stan_data$X
head(model_matrix)
#>   Intercept groupgroup_2 timetime_2 timetime_3 timetime_4
#> 1         1            0          0          0          0
#> 2         1            0          1          0          0
#> 3         1            0          0          1          0
#> 4         1            0          0          0          1
#> 5         1            0          0          0          0
#> 6         1            0          1          0          0
#>   groupgroup_2:timetime_2 groupgroup_2:timetime_3 groupgroup_2:timetime_4
#> 1                       0                       0                       0
#> 2                       0                       0                       0
#> 3                       0                       0                       0
#> 4                       0                       0                       0
#> 5                       0                       0                       0
#> 6                       0                       0                       0

4 Prior

Function brm_simulate_prior() simulates from the prior predictive distribution. It requires a dataset and a formula, and it accepts a custom prior constructed with brms::set_prior().

formula <- brm_formula(data = data)
#> Error: All observations in the data were removed presumably because of NA values.

library(brms)
prior <- set_prior("student_t(3, 0, 1.3)", class = "Intercept") +
  set_prior("student_t(3, 0, 1.2)", class = "b") +
  set_prior("student_t(3, 0, 1.1)", class = "b", dpar = "sigma") +
  set_prior("lkj(1)", class = "cortime")

prior
#>                 prior     class coef group resp  dpar nlpar   lb   ub source
#>  student_t(3, 0, 1.3) Intercept                             <NA> <NA>   user
#>  student_t(3, 0, 1.2)         b                             <NA> <NA>   user
#>  student_t(3, 0, 1.1)         b                 sigma       <NA> <NA>   user
#>                lkj(1)   cortime                             <NA> <NA>   user

sim <- brm_simulate_prior(
  data = data,
  formula = formula,
  prior = prior,
  refresh = 0
)

The output object sim has multiple draws from the prior predictive distribution. sim$outcome has outcome draws, and sim$parameters has parameter draws. sim$model_matrix has the model matrix, and sim$model has the full brms model fit object. You can pass sim$model to functions from brms and bayesplot such as pp_check().

names(sim)
#> [1] "data"         "model"        "model_matrix" "outcome"      "parameters"

In addition, sim$data has a copy of the original dataset, but with the outcome variable taken from the final draw from the prior predictive distribution. In addition, the missingness pattern is automatically applied so that sim$data$response is NA_real_ whenever sim$data$missing equals TRUE.

sim$data
#> # A tibble: 800 × 9
#>    response missing group   time   patient biomarker1 biomarker2 status1 status2
#>       <dbl> <lgl>   <chr>   <chr>  <chr>        <dbl>      <dbl> <chr>   <chr>  
#>  1  -0.0490 FALSE   group_1 time_1 patien…      0.328     -0.655 present absent 
#>  2  NA      TRUE    group_1 time_2 patien…      0.328     -0.655 present absent 
#>  3  NA      TRUE    group_1 time_3 patien…      0.328     -0.655 present absent 
#>  4  NA      TRUE    group_1 time_4 patien…      0.328     -0.655 present absent 
#>  5   7.19   FALSE   group_1 time_1 patien…      1.04      -0.779 absent  absent 
#>  6   3.18   FALSE   group_1 time_2 patien…      1.04      -0.779 absent  absent 
#>  7  NA      TRUE    group_1 time_3 patien…      1.04      -0.779 absent  absent 
#>  8  NA      TRUE    group_1 time_4 patien…      1.04      -0.779 absent  absent 
#>  9  15.6    FALSE   group_1 time_1 patien…      0.717     -0.954 present absent 
#> 10   0.203  FALSE   group_1 time_2 patien…      0.717     -0.954 present absent 
#> # ℹ 790 more rows

5 Posterior

brms supports posterior predictive simulations and checks through functions posteiror_predict(), posterior_epred(), and pp_check(). These can be used with a brms model fit object either from brm_model() or from brm_simulate_prior().

data <- sim$data
formula <- brm_formula(data = data)
model <- brm_model(data = data, formula = formula, refresh = 0)
outcome_draws <- posterior_predict(object = model)

The returned outcome_draws object is a numeric array of posterior predictive draws, with one row per draw and one column per non-missing observation (row) in the original data.

str(outcome_draws)
#>  num [1:4000, 1:659] 11.742 0.536 0.686 -8.36 -6.803 ...

dim(data)
#> [1] 800   9

sum(!is.na(data$response))
#> [1] 659

  1. The function help file explains the details about the model parameterization.↩︎