- 1 Introduction
- 2 Data simulation using function
`simulate_data()`

- 3 Handling of observed post-ICE data in
`rbmi`

under reference-based imputation - 4 Efficiently changing reference-based imputation strategies
- 5 Imputation under MAR with time-varying covariates
- 6 Custom imputation strategies
- 7 Custom analysis functions
- 8 Sensitivity analyses: Delta adjustments and tipping point analyses
- References

The purpose of this vignette is to provide an overview of some more advanced features of the `rbmi`

package.
The sections of the vignette are relatively self-contained, i.e. readers should be able to jump directly to the section which covers the functionality that they are most interested in.

`simulate_data()`

In order to demonstrate the advanced functions we will first create a simulated dataset with the `rbmi`

function `simulate_data()`

.
The `simulate_data()`

function generates data from a randomized clinical trial with longitudinal continuous outcomes and up to two different types of intercurrent events (ICEs).
One intercurrent event (ICE1) may be thought of as a discontinuation from study treatment due to study drug or condition related (SDCR) reasons.
The other event (ICE2) may be thought of as discontinuation from study treatment due to not study drug or condition related (NSDCR) reasons.
For the purpose of this vignette, we simulate data similarly to the simulation study reported in Wolbers et al. (2022) (though we change some of the simulation parameters) and include only one ICE type (ICE1).

Specifically, we simulate a 1:1 randomized trial of an active drug (intervention) versus placebo (control) with 100 subjects per group and 6 post-baseline assessments (bi-monthly visits until 12 months) under the following assumptions:

- The mean outcome trajectory in the placebo group increases linearly from 50 at baseline (visit 0) to 60 at visit 6, i.e. the slope is 10 points/year.
- The mean outcome trajectory in the intervention group is identical to the placebo group up to visit 2. From visit 2 onward, the slope decreases by 50% to 5 points/year.
- The covariance structure of the baseline and follow-up values in both groups is implied by a random intercept and slope model with a standard deviation of 5 for both the intercept and the slope, and a correlation of 0.25. In addition, an independent residual error with standard deviation 2.5 is added to each assessment.

- The probability of study drug discontinuation after each visit is calculated according to a logistic model which depends on the observed outcome at that visit. Specifically, a visit-wise discontinuation probability of 2% and 3% in the control and intervention group, respectively, is specified in case the observed outcome is equal to 50 (the mean value at baseline). The odds of a discontinuation is simulated to increase by +10% for each +1 point increase of the observed outcome.
- Study drug discontinuation is simulated to have no effect on the mean trajectory in the placebo group. In the intervention group, subjects who discontinue follow the slope of the mean trajectory from the placebo group from that time point onward. This is compatible with a copy increments in reference (CIR) assumption.
- Study drop-out at the study drug discontinuation visit occurs with a probability of 50% leading to missing outcome data from that time point onward.

The function `simulate_data()`

requires 3 arguments (see the function documentation `help(simulate_data)`

for more details):

`pars_c`

: The simulation parameters of the control group`pars_t`

: The simulation parameters of the intervention group`post_ice1_traj`

: Specifies how observed outcomes after ICE1 are simulated

Below, we report how data according to the specifications above can be simulated with function `simulate_data()`

:

```
library(rbmi)
library(dplyr)
library(ggplot2)
library(purrr)
set.seed(122)
<- 100
n <- c(0, 2, 4, 6, 8, 10, 12)
time
# Mean trajectory control
<- c(50.0, 51.66667, 53.33333, 55.0, 56.66667, 58.33333, 60.0)
muC
# Mean trajectory intervention
<- c(50.0, 51.66667, 53.33333, 54.16667, 55.0, 55.83333, 56.66667)
muT
# Create Sigma
<- 2.5
sd_error <- rbind(
covRE c(25.0, 6.25),
c(6.25, 25.0)
)
<- cbind(1, time / 12) %*% covRE %*% rbind(1, time / 12) + diag(sd_error^2, nrow = length(time))
Sigma
# Set probability of discontinuation
<- 0.02
probDisc_C <- 0.03
probDisc_T <- 1.10 # +1 point increase => +10% odds of discontinuation
or_outcome
# Set drop-out rate following discontinuation
<- 0.5
prob_dropout
# Set simulation parameters of the control group
<- set_simul_pars(
parsC mu = muC,
sigma = Sigma,
n = n,
prob_ice1 = probDisc_C,
or_outcome_ice1 = or_outcome,
prob_post_ice1_dropout = prob_dropout
)
# Set simulation parameters of the intervention group
<- parsC
parsT $mu <- muT
parsT$prob_ice1 <- probDisc_T
parsT
# Set assumption about post-ice trajectory
<- "CIR"
post_ice_traj
# Simulate data
<- simulate_data(
data pars_c = parsC,
pars_t = parsT,
post_ice1_traj = post_ice_traj
)
head(data)
#> id visit group outcome_bl outcome_noICE ind_ice1 ind_ice2 dropout_ice1
#> 1 id_1 0 Control 57.32704 57.32704 0 0 0
#> 2 id_1 1 Control 57.32704 54.69751 1 0 1
#> 3 id_1 2 Control 57.32704 58.60702 1 0 1
#> 4 id_1 3 Control 57.32704 61.50119 1 0 1
#> 5 id_1 4 Control 57.32704 56.68363 1 0 1
#> 6 id_1 5 Control 57.32704 66.14799 1 0 1
#> outcome
#> 1 57.32704
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 NA
# As a simple descriptive of the simulated data, summarize the number of subjects with ICEs and missing data
%>%
data group_by(id) %>%
summarise(
group = group[1],
any_ICE = (any(ind_ice1 == 1)),
any_NA = any(is.na(outcome))) %>%
group_by(group) %>%
summarise(
subjects_with_ICE = sum(any_ICE),
subjects_with_missings = sum(any_NA)
)#> # A tibble: 2 × 3
#> group subjects_with_ICE subjects_with_missings
#> <fct> <int> <int>
#> 1 Control 18 8
#> 2 Intervention 25 14
```

`rbmi`

under reference-based imputation`rbmi`

always uses all non-missing outcome data from the input data set, i.e. such data are never overwritten during the imputation step or removed from the analysis step. This implies that if there are data which are considered to be irrelevant for treatment effect estimation (e.g. data after an ICE for which the estimand specified a hypothetical strategy), then such data need to be removed from the input data set by the user prior to calling the `rbmi`

functions.

For imputation under a missing at random (`MAR`

) strategy, all observed outcome data is also included in the fitting of the base imputation model. However, for ICEs handled using reference-based imputation methods (such as `CIR`

, `CR`

, and `JR`

), `rbmi`

excludes observed post-ICE data from the base imputation model. If these data were not excluded, then the base imputation model would mistakenly estimate mean trajectories based on a mixture of observed pre- and post-ICE data which are not relevant for reference-based imputations. However, any observed post-ICE data are added back into the data set after the fitting of the base imputation model and included as is in the subsequent imputation and analysis steps.

Post-ICE data in the control or reference group are also excluded from the base imputation model if the user specifies a reference-based imputation strategy for such ICEs. This ensures that an ICE has the same impact on the data included in the base imputation model regardless whether the ICE occurred in the control or the intervention group. On the other hand, imputation in the reference group is based on a MAR assumption even for reference-based imputation methods and it may be preferable in some settings to include such post-ICE data from the control group in the base imputation model. This can be implemented by specifying a `MAR`

strategy for the ICE in the control group and a reference-based strategy for the same ICE in the intervention group. We will use this latter approach in our example below.

The simulated trial data from section 2 assumed that outcomes in the intervention group observed after the ICE “treatment discontinuation” follow the increments observed in the control group. Thus the imputation of missing data in the intervention group after treatment discontinuation might be performed under a reference-based copy increments in reference (`CIR`

) assumption.

Specifically, we implement an estimator under the following assumptions:

- The endpoint of interest is the change in the outcome from baseline at each visit.
- The imputation model includes the treatment group, the (categorical) visit, treatment-by-visit interactions, the baseline outcome, and baseline outcome-by-visit interactions as covariates.
- The imputation model assumes a common unstructured covariance matrix in both treatment groups
- In the control group, all missing data are imputed under MAR whereas in the intervention group, missing post-ICE data are imputed under a CIR assumption
- The analysis model of the endpoint in the imputed datasets is a separate ANCOVA model for each visit with the treatment group as the primary covariate and adjustment for the baseline outcome value.

For illustration purposes, we chose MI based on approximate Bayesian posterior draws with 20 random imputations which is not very demanding from a computational perspective. In practical applications, the number of random imputations may need to be increased. Moreover, other imputations are also supported in `rbmi`

. For guidance regarding the choice of the imputation approach, we refer the user to a comparison between all implemented approaches in Section 3.9 of the “Statistical Specifications” vignette (`vignette("stat_specs", package = "rbmi")`

).

We first report the code to set the variables of the imputation and analysis models. If you are not yet familiar with the syntax, we recommend that you first check the “quickstart” vignette (`vignette("quickstart", package = "rbmi")`

).

```
# Create data_ice including the subject's first visit affected by the ICE and the imputation strategy
# Imputation strategy for post-ICE data is CIR in the intervention group and MAR for the control group
# (note that ICEs which are handled using MAR are optional and do not impact the analysis
# because imputation of missing data under MAR is the default)
<- data %>%
data_ice_CIR group_by(id) %>%
filter(ind_ice1 == 1) %>% # select visits with ICEs
mutate(strategy = ifelse(group == "Intervention", "CIR", "MAR")) %>%
summarise(
visit = visit[1], # Select first visit affected by the ICE
strategy = strategy[1]
)
# Compute endpoint of interest: change from baseline and
# remove rows corresponding to baseline visits
<- data %>%
data filter(visit != 0) %>%
mutate(
change = outcome - outcome_bl,
visit = factor(visit, levels = unique(visit))
)
# Define key variables for the imputation and analysis models
<- set_vars(
vars subjid = "id",
visit = "visit",
outcome = "change",
group = "group",
covariates = c("visit*outcome_bl", "visit*group"),
strategy = "strategy"
)
<- vars
vars_an $covariates <- "outcome_bl" vars_an
```

The chosen imputation method can be set with the function `method_approxbayes()`

as follows:

`<- method_approxbayes(n_sample = 20) method `

We can now sequentially call the 4 key functions of `rbmi`

to perform the multiple imputation. Please note that the management of observed post-ICE data is performed without additional complexity for the user. `draws()`

automatically excludes post-ICE data handled with a reference-based method (but keeps post-ICE data handled using MAR) using information provided by the argument `data_ice`

. `impute()`

will impute only truly missing data in `data[[vars$outcome]]`

.

```
<- draws(
draw_obj data = data,
data_ice = data_ice_CIR,
vars = vars,
method = method,
quiet = TRUE,
ncores = 2
)
<- impute(
impute_obj_CIR
draw_obj,references = c("Control" = "Control", "Intervention" = "Control")
)
<- analyse(
ana_obj_CIR
impute_obj_CIR,vars = vars_an
)
<- pool(ana_obj_CIR)
pool_obj_CIR
pool_obj_CIR#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> ==================================================
#> parameter est se lci uci pval
#> --------------------------------------------------
#> trt_1 -0.486 0.512 -1.496 0.524 0.343
#> lsm_ref_1 2.62 0.362 1.907 3.333 <0.001
#> lsm_alt_1 2.133 0.362 1.42 2.847 <0.001
#> trt_2 -0.066 0.542 -1.135 1.004 0.904
#> lsm_ref_2 3.707 0.384 2.95 4.464 <0.001
#> lsm_alt_2 3.641 0.383 2.885 4.397 <0.001
#> trt_3 -1.782 0.607 -2.979 -0.585 0.004
#> lsm_ref_3 5.841 0.428 4.997 6.685 <0.001
#> lsm_alt_3 4.059 0.428 3.214 4.904 <0.001
#> trt_4 -2.518 0.692 -3.884 -1.152 <0.001
#> lsm_ref_4 7.656 0.492 6.685 8.627 <0.001
#> lsm_alt_4 5.138 0.488 4.176 6.1 <0.001
#> trt_5 -3.658 0.856 -5.346 -1.97 <0.001
#> lsm_ref_5 9.558 0.598 8.379 10.737 <0.001
#> lsm_alt_5 5.9 0.608 4.699 7.101 <0.001
#> trt_6 -4.537 0.954 -6.42 -2.655 <0.001
#> lsm_ref_6 11.048 0.666 9.735 12.362 <0.001
#> lsm_alt_6 6.511 0.674 5.181 7.841 <0.001
#> --------------------------------------------------
```

This last output gives an estimated difference of -4.537 (95% CI -6.420 to -2.655) between the two groups at the last visit with an associated p-value lower than 0.001.

The `draws()`

function is by far the most computationally intensive function in `rbmi`

.
In some settings, it may be important to explore the impact of a change in the
reference-based imputation strategy on the results.
Such a change does not affect the imputation model but it does
affect the subsequent imputation step.
In order to allow changes in the imputation strategy without having to re-run the
`draws()`

function, the function `impute()`

has an additional argument `update_strategies`

.

However, please note that this functionality comes with some important limitations:
As described at the beginning of Section 3, post-ICE outcomes are included in the input dataset for the base imputation model if the imputation method is `MAR`

but they are excluded for reference-based imputation methods (such as `CIR`

, `CR`

, and `JR`

).
Therefore, `updata_strategies`

cannot be applied if the imputation strategy is changed from a MAR to a non-MAR strategy in the presence of observed post-ICE outcomes. Similarly, a change from a non-MAR strategy to MAR triggers a warning in the presence of observed post-ICE outcomes because the base imputation model was not fitted to all relevant data under MAR.
Finally, `update_strategies`

cannot be applied if the timing of any of the ICEs is changed (in argument `data_ice`

) in addition to the imputation strategy.

As an example, we described an analysis under a copy increments in reference (CIR) assumption in the previous section. Let’s assume we want to change this strategy to a jump to reference imputation strategy for a sensitivity analysis. This can be efficiently implemented using `update_strategies`

as follows:

```
# Change ICE strategy from CIR to JR
<- data_ice_CIR %>%
data_ice_JR mutate(strategy = ifelse(strategy == "CIR", "JR", strategy))
<- impute(
impute_obj_JR
draw_obj,references = c("Control" = "Control", "Intervention" = "Control"),
update_strategy = data_ice_JR
)
<- analyse(
ana_obj_JR
impute_obj_JR,vars = vars_an
)
<- pool(ana_obj_JR)
pool_obj_JR
pool_obj_JR#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> ==================================================
#> parameter est se lci uci pval
#> --------------------------------------------------
#> trt_1 -0.485 0.513 -1.496 0.526 0.346
#> lsm_ref_1 2.609 0.363 1.892 3.325 <0.001
#> lsm_alt_1 2.124 0.361 1.412 2.836 <0.001
#> trt_2 -0.06 0.535 -1.115 0.995 0.911
#> lsm_ref_2 3.694 0.378 2.948 4.441 <0.001
#> lsm_alt_2 3.634 0.381 2.882 4.387 <0.001
#> trt_3 -1.767 0.598 -2.948 -0.587 0.004
#> lsm_ref_3 5.845 0.422 5.012 6.677 <0.001
#> lsm_alt_3 4.077 0.432 3.225 4.93 <0.001
#> trt_4 -2.529 0.686 -3.883 -1.175 <0.001
#> lsm_ref_4 7.637 0.495 6.659 8.614 <0.001
#> lsm_alt_4 5.108 0.492 4.138 6.078 <0.001
#> trt_5 -3.523 0.856 -5.212 -1.833 <0.001
#> lsm_ref_5 9.554 0.61 8.351 10.758 <0.001
#> lsm_alt_5 6.032 0.611 4.827 7.237 <0.001
#> trt_6 -4.36 0.952 -6.238 -2.482 <0.001
#> lsm_ref_6 11.003 0.676 9.669 12.337 <0.001
#> lsm_alt_6 6.643 0.687 5.287 8 <0.001
#> --------------------------------------------------
```

For imputations under a jump to reference assumption, we get an estimated difference of -4.360 (95% CI -6.238 to -2.482) between the two groups at the last visit with an associated p-value of <0.001.

Guizzaro et al. (2021) suggested to implement a treatment policy strategy via imputation under a MAR assumption after conditioning on the subject’s ICE status, i.e. to impute missing post-ICE data based on observed post-ICE data. One possible implementation of this proposal is to add time-varying covariates to the imputation model. A case study which implements this proposal and compares it to reference-based imputation methods for estimators in early Parkinson’s disease can be found in Noci et al. (2021).

In some settings, this may be carried out by including a binary time-varying indicator of the subject’s ICE status at each visit (defined as 0 for pre-ICE visits and as 1 for post-ICE visits) to the imputation model. However, for the simulated data introduced in section 2, it may be more plausible to assume that treatment discontinuation leads to a change in the “slope” of the mean outcome trajectory. This can be implemented by including a time-varying covariate which is equal to 0 for visits prior to the treatment discontinuation and equal to the time from the treatment discontinuation for subsequent visits. The regression coefficient of the corresponding change in the post-ICE “slope” should then be allowed to depend on the assigned treatment group, i.e. the imputation model should include an interaction between the time-varying covariate and the treatment group.

Let’s first define the time-varying covariate:

```
<- data %>%
data group_by(id) %>%
mutate(time_from_ice1 = cumsum(ind_ice1)*2/12 ) # multiplication by 2/12 because visits are bi-monthly
```

We can then include the time-varying covariate in the imputation model, crossed with the `group`

variable:

```
<- set_vars(
vars_tv subjid = "id",
visit = "visit",
outcome = "change",
group = "group",
covariates = c("visit*outcome_bl", "visit*group", "time_from_ice1*group"),
strategy = "strategy"
)
```

We now sequentially call the 4 key `rbmi`

functions:

```
<- draws(
draw_obj data = data,
data_ice = NULL, # if NULL, MAR is assumed for all missing data
vars = vars_tv,
method = method,
quiet = TRUE
)
<- impute(
impute_obj_tv
draw_obj,references = c("Control" = "Control", "Intervention" = "Intervention")
)
<- analyse(
ana_obj_tv
impute_obj_tv,vars = vars_an
)
pool(ana_obj_tv)
#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> ==================================================
#> parameter est se lci uci pval
#> --------------------------------------------------
#> trt_1 -0.492 0.515 -1.507 0.524 0.341
#> lsm_ref_1 2.623 0.362 1.908 3.338 <0.001
#> lsm_alt_1 2.131 0.366 1.409 2.854 <0.001
#> trt_2 0.018 0.55 -1.067 1.103 0.974
#> lsm_ref_2 3.697 0.382 2.943 4.45 <0.001
#> lsm_alt_2 3.715 0.394 2.936 4.493 <0.001
#> trt_3 -1.802 0.614 -3.015 -0.59 0.004
#> lsm_ref_3 5.815 0.429 4.97 6.661 <0.001
#> lsm_alt_3 4.013 0.441 3.142 4.884 <0.001
#> trt_4 -2.543 0.704 -3.932 -1.154 <0.001
#> lsm_ref_4 7.609 0.486 6.65 8.568 <0.001
#> lsm_alt_4 5.066 0.516 4.046 6.086 <0.001
#> trt_5 -3.739 0.879 -5.475 -2.004 <0.001
#> lsm_ref_5 9.499 0.606 8.302 10.695 <0.001
#> lsm_alt_5 5.759 0.636 4.502 7.017 <0.001
#> trt_6 -4.685 0.98 -6.622 -2.748 <0.001
#> lsm_ref_6 10.988 0.667 9.67 12.305 <0.001
#> lsm_alt_6 6.302 0.712 4.894 7.711 <0.001
#> --------------------------------------------------
```

The following imputation strategies are implemented in `rbmi`

:

- Missing at Random (
`MAR`

) - Jump to Reference (
`JR`

) - Copy Reference (
`CR`

) - Copy Increments in Reference (
`CIR`

) - Last Mean Carried Forward (
`LMCF`

)

In addition, `rbmi`

allows the user to implement their own imputation strategy.
To do this, the user needs to do three things:

- Define a function implementing the new imputation strategy.
- Specify which patients use this strategy in the
`data_ice`

dataset provided to`draws()`

. - Provide the imputation strategy function to
`impute()`

.

The imputation strategy function must take 3 arguments (`pars_group`

, `pars_ref`

, and `index_mar`

) and calculates the mean and covariance matrix of the subject’s marginal imputation distribution which will then be applied to subjects to which the strategy applies.
Here, `pars_group`

contains the predicted mean trajectory (`pars_group$mu`

, a numeric vector) and covariance matrix (`pars_group$sigma`

) for a subject conditional on their assigned treatment group and covariates.
`pars_ref`

contains the corresponding mean trajectory and covariance matrix conditional on the reference group and the subject’s covariates.
`index_mar`

is a logical vector which specifies for each visit whether the visit is unaffected by an ICE handled using a non-MAR method or not.
As an example, the user can check how the CIR strategy was implemented by looking at function `strategy_CIR()`

.

```
strategy_CIR#> function (pars_group, pars_ref, index_mar)
#> {
#> if (all(index_mar)) {
#> return(pars_group)
#> }
#> else if (all(!index_mar)) {
#> return(pars_ref)
#> }
#> mu <- pars_group$mu
#> last_mar <- which(!index_mar)[1] - 1
#> increments_from_last_mar_ref <- pars_ref$mu[!index_mar] -
#> pars_ref$mu[last_mar]
#> mu[!index_mar] <- mu[last_mar] + increments_from_last_mar_ref
#> sigma <- compute_sigma(sigma_group = pars_group$sigma, sigma_ref = pars_ref$sigma,
#> index_mar = index_mar)
#> pars <- list(mu = mu, sigma = sigma)
#> return(pars)
#> }
#> <bytecode: 0x7f83e21014a0>
#> <environment: namespace:rbmi>
```

To further illustrate this for a simple example, assume that a new strategy is to be implemented as follows: - The marginal mean of the imputation distribution is equal to the marginal mean trajectory for the subject according to their assigned group and covariates up to the ICE. - After the ICE the marginal mean of the imputation distribution is equal to the average of the visit-wise marginal means based on the subjects covariates and the assigned group or the reference group, respectively. - For the covariance matrix of the marginal imputation distribution, the covariance matrix from the assigned group is taken.

To do this, we first need to define the imputation function which for this example could be coded as follows:

```
<- function(pars_group, pars_ref, index_mar) {
strategy_AVG <- (pars_group$mu + pars_ref$mu) / 2
mu_mean <- pars_group
x $mu[!index_mar] <- mu_mean[!index_mar]
xreturn(x)
}
```

And an example showing its use:

```
<- list(
pars_group mu = c(1, 2, 3),
sigma = as_vcov(c(1, 3, 2), c(0.4, 0.5, 0.45))
)
<- list(
pars_ref mu = c(5, 6, 7),
sigma = as_vcov(c(2, 1, 1), c(0.7, 0.8, 0.5))
)
<- c(TRUE, TRUE, FALSE)
index_mar
strategy_AVG(pars_group, pars_ref, index_mar)
#> $mu
#> [1] 1 2 5
#>
#> $sigma
#> [,1] [,2] [,3]
#> [1,] 1.0 1.2 1.0
#> [2,] 1.2 9.0 2.7
#> [3,] 1.0 2.7 4.0
```

To incorporate this into `rbmi`

, `data_ice`

needs to be updated such that the strategy `AVG`

is specified for visits affected by the ICE. Additionally, the function needs
to be provided to `impute()`

via the `getStrategies()`

function as shown below:

```
<- data_ice_CIR %>%
data_ice_AVG mutate(strategy = ifelse(strategy == "CIR", "AVG", strategy))
<- draws(
draw_obj data = data,
data_ice = data_ice_AVG,
vars = vars,
method = method,
quiet = TRUE
)
<- impute(
impute_obj
draw_obj,references = c("Control" = "Control", "Intervention" = "Control"),
strategies = getStrategies(AVG = strategy_AVG)
)
```

Then, the analysis could proceed by calling `analyse()`

and `pool()`

as before.

By default `rbmi`

will analyse the data by using the `ancova()`

function.
This analysis function fits an ANCOVA model to the outcomes from each visit separately,
and returns the “treatment effect” estimate as well as the corresponding least square means
for each group. If the user wants to perform a different analysis, or return different
statistics from the analysis, then this can be done by using a custom analysis function.
Beware that the validity of the conditional mean imputation method has only been formally established for analysis functions corresponding to linear models (such as ANCOVA) and caution is
required when applying alternative analysis functions for this method.

The custom analysis function must take a `data.frame`

as its
first argument and return a named `list`

with each element itself being a `list`

containing at a minimum a point estimate, called `est`

.
For method `method_bayes()`

or `method_approxbayes()`

, the list must additionally contain a
standard error (element `se`

) and, if available, the degrees of freedom of the complete-data analysis model (element `df`

).

As a simple example, we replicate the ANCOVA analysis at the last visit for the CIR-based imputations with a user-defined analysis function below:

```
<- function(data, ...) {
compare_change_lastvisit <- lm(change ~ group + outcome_bl, data = data, subset = (visit == 6) )
fit <- list(
res trt = list(
est = coef(fit)["groupIntervention"],
se = sqrt(vcov(fit)["groupIntervention", "groupIntervention"]),
df = df.residual(fit)
)
)return(res)
}
<- analyse(
ana_obj_CIR6
impute_obj_CIR,fun = compare_change_lastvisit,
vars = vars_an
)
pool(ana_obj_CIR6)
#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> =================================================
#> parameter est se lci uci pval
#> -------------------------------------------------
#> trt -4.537 0.954 -6.42 -2.655 <0.001
#> -------------------------------------------------
```

As a second example, assume that for a supplementary analysis the user wants to compare the proportion of subjects with a change from baseline of >10 points at the last visit between the treatment groups with the baseline outcome as an additional covariate. This could lead to the following basic analysis function:

```
<- function(data, ...) {
compare_prop_lastvisit <- glm(
fit I(change > 10) ~ group + outcome_bl,
family = binomial(),
data = data,
subset = (visit == 6)
)<- list(
res trt = list(
est = coef(fit)["groupIntervention"],
se = sqrt(vcov(fit)["groupIntervention", "groupIntervention"]),
df = NA
)
)return(res)
}
<- analyse(
ana_obj_prop
impute_obj_CIR,fun = compare_prop_lastvisit,
vars = vars_an
)
<- pool(ana_obj_prop)
pool_obj_prop
pool_obj_prop#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> =================================================
#> parameter est se lci uci pval
#> -------------------------------------------------
#> trt -1.052 0.314 -1.667 -0.438 0.001
#> -------------------------------------------------
<- as.data.frame(pool_obj_prop) %>%
tmp mutate(
OR = exp(est),
OR.lci = exp(lci),
OR.uci = exp(uci)
%>%
) select(parameter, OR, OR.lci, OR.uci)
tmp#> parameter OR OR.lci OR.uci
#> 1 trt 0.3491078 0.188807 0.6455073
```

Note that if the user wants `rbmi`

to use a normal approximation to the pooled test statistics, then the degrees of freedom need to be set to `df = NA`

(as per the above example). If the degrees of freedom of the complete data test statistics are known or if the degrees of freedom are set to `df = Inf`

, then `rbmi`

pools the degrees of freedom across imputed datasets according to the rule by Barnard and Rubin (see the “Statistical Specifications” vignette (`vignette("stat_specs", package = "rbmi")`

for details). According to this rule, infinite degrees of freedom for the complete data analysis do not imply that the pooled degrees of freedom are also infinite.
Rather, in this case the pooled degrees of freedom are `(M-1)/lambda^2`

, where `M`

is the number of imputations and `lambda`

is the fraction of missing information (see Barnard and Rubin (1999) for details).

Delta-adjustments are used to impute missing data under a not missing at random (NMAR) assumption. This reflects the belief that unobserved outcomes would have been systematically “worse” (or “better”) than “comparable” observed outcomes. For an extensive discussion of delta-adjustment methods, we refer to Cro et al. (2020).

In `rbmi`

, a marginal delta-adjustment approach is implemented. This means that the delta-adjustment is applied to the dataset after data imputation under MAR or reference-based missing data assumptions and prior to the analysis of the imputed data.
Sensitivity analysis using delta-adjustments can therefore be performed without having to re-fit the imputation model. In `rbmi`

, they are implemented via the `delta`

argument of the `analyse()`

function.

The `delta`

argument of `analyse()`

allows users to modify the outcome variable prior to the analysis.
To do this, the user needs to provide a `data.frame`

which contains columns for the subject and visit (to identify the observation to be adjusted) plus an additional column called `delta`

which specifies the value which will be added to the outcomes prior to the analysis.

The `delta_template()`

function supports the user in creating this `data.frame`

: it creates a skeleton `data.frame`

containing one row per subject and visit with the value of `delta`

set to 0 for all observations:

```
<- delta_template(imputations = impute_obj_CIR)
dat_delta head(dat_delta)
#> id visit group is_mar is_missing is_post_ice strategy delta
#> 1 id_1 1 Control TRUE TRUE TRUE MAR 0
#> 2 id_1 2 Control TRUE TRUE TRUE MAR 0
#> 3 id_1 3 Control TRUE TRUE TRUE MAR 0
#> 4 id_1 4 Control TRUE TRUE TRUE MAR 0
#> 5 id_1 5 Control TRUE TRUE TRUE MAR 0
#> 6 id_1 6 Control TRUE TRUE TRUE MAR 0
```

Note that the output of `delta_template()`

contains additional information which can be used to properly re-set variable `delta`

.

For example, assume that the user wants to implement a delta-adjustment to the imputed values under CIR described in section 3.

Specifically, assume that a fixed “worsening adjustment” of +5 points is applied to all imputed values regardless of the treatment group. This could be programmed as follows:

```
# Set delta-value to 5 for all imputed (previously missing) outcomes and 0 for all other outcomes
<- delta_template(imputations = impute_obj_CIR) %>%
dat_delta mutate(delta = is_missing * 5)
# Repeat the analyses with the delta-adjusted values and pool results
<- analyse(
ana_delta
impute_obj_CIR,delta = dat_delta,
vars = vars_an
)pool(ana_delta)
#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> ==================================================
#> parameter est se lci uci pval
#> --------------------------------------------------
#> trt_1 -0.482 0.524 -1.516 0.552 0.359
#> lsm_ref_1 2.718 0.37 1.987 3.448 <0.001
#> lsm_alt_1 2.235 0.37 1.505 2.966 <0.001
#> trt_2 -0.016 0.56 -1.12 1.089 0.978
#> lsm_ref_2 3.907 0.396 3.125 4.688 <0.001
#> lsm_alt_2 3.891 0.395 3.111 4.671 <0.001
#> trt_3 -1.684 0.641 -2.948 -0.42 0.009
#> lsm_ref_3 6.092 0.452 5.201 6.983 <0.001
#> lsm_alt_3 4.408 0.452 3.515 5.3 <0.001
#> trt_4 -2.359 0.741 -3.821 -0.897 0.002
#> lsm_ref_4 7.951 0.526 6.913 8.99 <0.001
#> lsm_alt_4 5.593 0.522 4.563 6.623 <0.001
#> trt_5 -3.34 0.919 -5.153 -1.526 <0.001
#> lsm_ref_5 9.899 0.643 8.631 11.168 <0.001
#> lsm_alt_5 6.559 0.653 5.271 7.848 <0.001
#> trt_6 -4.21 1.026 -6.236 -2.184 <0.001
#> lsm_ref_6 11.435 0.718 10.019 12.851 <0.001
#> lsm_alt_6 7.225 0.725 5.793 8.656 <0.001
#> --------------------------------------------------
```

The same approach can be used to implement a tipping point analysis. Here, we apply different delta-adjustments to imputed data from the control and the intervention group, respectively. Assume that delta-adjustments by less then -5 points or by more than +15 points are considered implausible from a clinical perspective. Therefore, we vary the delta-values in each group between -5 to +15 points to investigate which delta combinations lead to a “tipping” of the primary analysis result, defined here as an analysis p-value \(\geq 0.05\).

```
<- function(delta_control, delta_intervention) {
perform_tipp_analysis
# Derive delta offset based on control and intervention specific deltas
<- delta_df_init %>%
delta_df mutate(
delta_ctl = (group == "Control") * is_missing * delta_control,
delta_int = (group == "Intervention") * is_missing * delta_intervention,
delta = delta_ctl + delta_int
)
<- analyse(
ana_delta
impute_obj_CIR,fun = compare_change_lastvisit,
vars = vars_an,
delta = delta_df,
)
<- as.data.frame(pool(ana_delta))
pool_delta
list(
trt_effect_6 = pool_delta[["est"]],
pval_6 = pool_delta[["pval"]]
)
}
# Get initial delta template
<- delta_template(impute_obj_CIR)
delta_df_init
<- expand.grid(
tipp_frame_grid delta_control = seq(-5, 15, by = 2),
delta_intervention = seq(-5, 15, by = 2)
%>%
) as_tibble()
<- tipp_frame_grid %>%
tipp_frame mutate(
results_list = map2(delta_control, delta_intervention, perform_tipp_analysis),
trt_effect_6 = map_dbl(results_list, "trt_effect_6"),
pval_6 = map_dbl(results_list, "pval_6")
%>%
) select(-results_list) %>%
mutate(
pval = cut(
pval_6,c(0, 0.001, 0.01, 0.05, 0.2, 1),
right = FALSE,
labels = c("<0.001", "0.001 - <0.01", "0.01- <0.05", "0.05 - <0.20", ">= 0.20")
)
)
# Show delta values which lead to non-significant analysis results
%>%
tipp_frame filter(pval_6 >= 0.05)
#> # A tibble: 3 × 5
#> delta_control delta_intervention trt_effect_6 pval_6 pval
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 -5 15 -1.99 0.0935 0.05 - <0.20
#> 2 -3 15 -2.15 0.0704 0.05 - <0.20
#> 3 -1 15 -2.31 0.0527 0.05 - <0.20
ggplot(tipp_frame, aes(delta_control, delta_intervention, fill = pval)) +
geom_raster() +
scale_fill_manual(values = c("darkgreen", "lightgreen", "lightyellow", "orange", "red"))
```

According to this analysis, the significant test result from the primary analysis under CIR could only be tipped to a non-significant result for rather extreme delta-adjustments. Please note that for a real analysis it is recommended to use a smaller step size in the grid than what has been used here.

`dlag`

and `delta`

arguments of `delta_template()`

So far, we have only discussed simple `delta`

arguments which add the same value to all imputed values.
However, the user may want to apply more flexible delta-adjustments to missing values after an intercurrent event (ICE) and vary the magnitude of the delta adjustment depending on the how far away the visit in question is from the ICE visit.

To facilitate the creation of such flexible delta-adjustments, the `delta_template()`

function has two optional additional arguments `delta`

and `dlag`

. The `delta`

argument specifies the default amount of delta
that should be applied to each post-ICE visit, whilst
`dlag`

specifies the scaling coefficient to be applied based upon the visits proximity
to the first visit affected by the ICE. By default, the delta will only be added to unobserved (i.e. imputed) post-ICE
outcomes but this can be changed by setting the optional argument `missing_only = FALSE`

.

The usage of the `delta`

and `dlag`

arguments is best illustrated with a few examples:

Assume a setting with 4 visits and that the user specified `delta = c(5,6,7,8)`

and `dlag=c(1,2,3,4)`

.

For a subject for whom the first visit affected by the ICE is visit 2, these values of `delta`

and `dlag`

would imply the following delta offset:

```
v1 v2 v3 v4
--------------
5 6 7 8 # delta assigned to each visit
0 1 2 3 # scaling starting from the first visit after the subjects ICE
--------------
0 6 14 24 # delta * scaling
--------------
0 6 20 44 # cumulative sum (i.e. delta) to be applied to each visit
```

That is, the subject would have a delta offset of 0 applied to visit v1, 6 for visit v2, 20 for visit v3 and 44 for visit v4.

Assume instead, that the subject’s first visit affected by the ICE was visit 3. Then, the above values of `delta`

and `dlag`

would imply the following delta offset:

```
v1 v2 v3 v4
--------------
5 6 7 8 # delta assigned to each visit
0 0 1 2 # scaling starting from the first visit after the subjects ICE
--------------
0 0 7 16 # delta * scaling
--------------
0 0 7 23 # cumulative sum (i.e. delta) to be applied to each visit
```

To apply a constant delta value of +5 to all visits affected by the ICE
regardless of their proximity to the first ICE visit, one could set `delta = c(5,5,5,5)`

and `dlag = c(1,0,0,0)`

.
Alternatively, it may be more straightforward for this setting to call the `delta_template()`

function without the `delta`

and `dlag`

arguments and then overwrite the `delta`

column of the resulting `data.frame`

as described in the previous section (and additionally relying on the `is_post_ice`

variable).

Another way of using these arguments is to set `delta`

to the difference in time
between visits and dlag to be the amount of delta per unit of time. For example,
let’s say that visits occur on weeks 1, 5, 6 and 9 and that we want a delta of 3
to be applied for each week after an ICE.
For simplicity, we assume that the ICE occurs immediately after the subject’s last visit which
is not affected by the ICE. This this could be achieved by setting
`delta = c(1,4,1,3)`

(the difference in weeks between each visit) and `dlag = c(3, 3, 3, 3)`

.

Assume a subject’s first visit affected by the ICE was visit v2, then these values of `delta`

and `dlag`

would imply the following delta offsets:

```
v1 v2 v3 v4
--------------
1 4 1 3 # delta assigned to each visit
0 3 3 3 # scaling starting from the first visit after the subjects ICE
--------------
0 12 3 9 # delta * scaling
--------------
0 12 15 24 # cumulative sum (i.e. delta) to be applied to each visit
```

To wrap up, we show this in action for our simulated dataset from section 2 and the imputed datasets
based on a CIR assumption from section 3.
The simulation setting specified follow-up visits at months 2, 4, 6, 8, 10, and 12.

Assume that we want to apply a delta-adjustment of 1 for every month after an ICE to unobserved post-ICE visits from the intervention group only. (E.g. if the ICE occurred immediately after the month 4 visit, then the total delta applied to a missing value from the month 10 visit would be 6.)

To program this, we first use the `delta`

and `dlag`

arguments of `delta_template()`

to set up a corresponding template `data.frame`

:

```
<- delta_template(
delta_df
impute_obj_CIR,delta = c(2, 2, 2, 2, 2, 2),
dlag = c(1, 1, 1, 1, 1, 1)
)
head(delta_df)
#> id visit group is_mar is_missing is_post_ice strategy delta
#> 1 id_1 1 Control TRUE TRUE TRUE MAR 2
#> 2 id_1 2 Control TRUE TRUE TRUE MAR 4
#> 3 id_1 3 Control TRUE TRUE TRUE MAR 6
#> 4 id_1 4 Control TRUE TRUE TRUE MAR 8
#> 5 id_1 5 Control TRUE TRUE TRUE MAR 10
#> 6 id_1 6 Control TRUE TRUE TRUE MAR 12
```

Next, we can use the additional metadata variables provided by `delta_template()`

to manually
reset the delta values for the control group back to 0:

```
<- delta_df %>%
delta_df2 mutate(delta = if_else(group == "Control", 0, delta))
head(delta_df2)
#> id visit group is_mar is_missing is_post_ice strategy delta
#> 1 id_1 1 Control TRUE TRUE TRUE MAR 0
#> 2 id_1 2 Control TRUE TRUE TRUE MAR 0
#> 3 id_1 3 Control TRUE TRUE TRUE MAR 0
#> 4 id_1 4 Control TRUE TRUE TRUE MAR 0
#> 5 id_1 5 Control TRUE TRUE TRUE MAR 0
#> 6 id_1 6 Control TRUE TRUE TRUE MAR 0
```

Finally, we can use our delta `data.frame`

to apply the desired delta offset to our analysis:

```
<- analyse(impute_obj_CIR, delta = delta_df2, vars = vars_an)
ana_delta pool(ana_delta)
#>
#> Pool Object
#> -----------
#> Number of Results Combined: 20
#> Method: rubin
#> Confidence Level: 0.95
#> Alternative: two.sided
#>
#> Results:
#>
#> ==================================================
#> parameter est se lci uci pval
#> --------------------------------------------------
#> trt_1 -0.446 0.514 -1.459 0.567 0.386
#> lsm_ref_1 2.62 0.363 1.904 3.335 <0.001
#> lsm_alt_1 2.173 0.363 1.458 2.889 <0.001
#> trt_2 0.072 0.546 -1.006 1.15 0.895
#> lsm_ref_2 3.708 0.387 2.945 4.471 <0.001
#> lsm_alt_2 3.78 0.386 3.018 4.542 <0.001
#> trt_3 -1.507 0.626 -2.743 -0.272 0.017
#> lsm_ref_3 5.844 0.441 4.973 6.714 <0.001
#> lsm_alt_3 4.336 0.442 3.464 5.209 <0.001
#> trt_4 -2.062 0.731 -3.504 -0.621 0.005
#> lsm_ref_4 7.658 0.519 6.634 8.682 <0.001
#> lsm_alt_4 5.596 0.515 4.58 6.612 <0.001
#> trt_5 -2.938 0.916 -4.746 -1.13 0.002
#> lsm_ref_5 9.558 0.641 8.293 10.823 <0.001
#> lsm_alt_5 6.62 0.651 5.335 7.905 <0.001
#> trt_6 -3.53 1.045 -5.591 -1.469 0.001
#> lsm_ref_6 11.045 0.73 9.604 12.486 <0.001
#> lsm_alt_6 7.515 0.738 6.058 8.971 <0.001
#> --------------------------------------------------
```

Barnard, John, and Donald B Rubin. 1999. “Miscellanea. Small-Sample Degrees of Freedom with Multiple Imputation.” *Biometrika* 86 (4): 948–55.

Cro, Suzie, Tim P Morris, Michael G Kenward, and James R Carpenter. 2020. “Sensitivity Analysis for Clinical Trials with Missing Continuous Outcome Data Using Controlled Multiple Imputation: A Practical Guide.” *Statistics in Medicine* 39 (21): 2815–42.

Guizzaro, Lorenzo, Frank Pétavy, Robin Ristl, and Ciro Gallo. 2021. “The Use of a Variable Representing Compliance Improves Accuracy of Estimation of the Effect of Treatment Allocation Regardless of Discontinuation in Trials with Incomplete Follow-up.” *Statistics in Biopharmaceutical Research* 13 (1): 119–27.

Noci, Alessandro, Marcel Wolbers, Markus Abt, Corine Baayen, Hans Ulrich Burger, Man Jin, and Weining Zhao Robieson. 2021. “A Comparison of Estimand and Estimation Strategies for Clinical Trials in Early Parkinson’s Disease.” https://arxiv.org/abs/2112.03700.

Wolbers, Marcel, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, and Jonathan W. Bartlett. 2022. “Standard and Reference-Based Conditional Mean Imputation.” *Pharmaceutical Statistics*. https://doi.org/10.1002/pst.2234.