Marginal standardization, `approach = "margstd_delta"`

and
`approach = "margstd_boot"`

, makes use of the good
convergence properties of the logit link, which is also guaranteed to
result in probabilities within (0; 1).

After fitting a logistic model, predicted probabilities for each observation are obtained for the levels of the exposure variable. Risk ratios and risk differences are calculated by contrasting the predicted probabilities as ratios or differences. Standard errors and confidence intervals are obtained via the delta method or via bootstrapping the entire procedure.

We use the same example data as in the Get Started vignette.

```
library(risks) # provides riskratio(), riskdiff(), postestimation functions
library(dplyr) # For data handling
library(broom) # For tidy() model summaries
data(breastcancer)
```

We fit the same risk difference model as in the Get Started vignette,
this time explicitly specifying
`approach = "margstd_delta"`

:

```
fit_margstd <- riskdiff(formula = death ~ stage + receptor,
data = breastcancer,
approach = "margstd_delta")
summary(fit_margstd)
#>
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#>
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.00000 0.00000 NaN NaN
#> stageStage II 0.16303 0.05964 2.734 0.00626 **
#> stageStage III 0.57097 0.09962 5.732 9.95e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.00000000 0.0000000
#> stageStage II 0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158
```

Consistent with earlier results, we observed that women with stage III breast cancer have a 57 percentage points higher risk of death, adjusting for hormone receptor status.

Note that coefficients and standard errors are only estimated for the exposure variable. Model fit characteristics and predicted values stem directly from the underlying logistic model.

Standardization can only be done over one exposure variable, and thus no coefficients are estimated for other variables in the model.

- By default, the first right-hand variable in the model formula will
be assumed to be the exposure. The variable types
`logical`

,`factor`

,`ordered`

, and`character`

are taken to represent categorical variables. - Using
`variable =`

, a different variable can be specified. - Using
`at =`

, levels for contrasts and the order of levels can be specified. The first level is used as the reference with a risk ratio of 1 or a risk difference of 0. A warning will be shown for continuous values if the requested levels exceed the range of data (extrapolation).

Requesting estimates for a different exposure variable, using
`variable = "receptor"`

:

```
fit_margstd2 <- riskdiff(formula = death ~ stage + receptor,
data = breastcancer,
approach = "margstd_delta", variable = "receptor")
summary(fit_margstd2)
#>
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#>
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> receptorHigh 0.00000 0.00000 NaN NaN
#> receptorLow 0.16163 0.07454 2.169 0.0301 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> receptorHigh 0.00000000 0.0000000
#> receptorLow 0.01554788 0.3077201
```

Marginal standardization cannot rely on analytical standard error
formulae. One approach (`"margstd_delta"`

) is the delta
method. Alternatively, parametric bootstrapping
(`"margstd_boot"`

with the default,
`bootci = "bca"`

) can be used: given the initial
maximum-likelihood estimates of the coefficients and the observed
predictors, the outcome vector is predicted from a binomial
distribution. The model is fit again using the predicted outcomes and
observed predictors. Repeating this process generates the empirical
distribution of the coefficients.

Confidence intervals for coefficients are calculated based on
BC_{a} confidence intervals for parametric bootstrapping (Efron
B, Narasimhan B. The Automatic Construction of Bootstrap Confidence
Intervals. J Comput
Graph Stat 2020;29(3):608-619).

For comparison purposes, alternative approaches to bootstrapping and
confidence interval estimation can requested in `tidy()`

,
`summary.margstd()`

, and `confint()`

:

`bootci = "normal"`

: Normality-based confidence intervals after parametric bootstrapping. These may give out-of-range estimates for confidence limits.`bootci = "nonpar"`

: BC_{a}intervals after*nonparametric*bootstrapping. Here, the data are resampled, rather than the outcome predicted from the model. Especially in the small datasets in which regular binomial models with log links may not converge and marginal standardization may be the only option, resampling the data comes with the risk of empty strata in the resamples, resulting in sparse-data bias or model nonconvergence. The resulting confidence intervals may be too narrow. This option should be used with caution.

When using `tidy()`

to access model results, the parameter
`bootverbose = TRUE`

will add bootstrapping parameters and
properties to the tibble:

```
fit_margstd3 <- riskdiff(formula = death ~ stage + receptor,
data = breastcancer,
approach = "margstd_boot")
summary(fit_margstd3)
#>
#> Risk difference model, fitted via marginal standardization of a logistic model with bootstrapping (margstd_boot).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#>
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.00000 0.00000 NaN NaN
#> stageStage II 0.16303 0.06133 2.658 0.00785 **
#> stageStage III 0.57097 0.09823 5.813 6.15e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (1000 BCa bootstrap repeats)
#> 2.5% 97.5%
#> stageStage I NA NA
#> stageStage II 0.02144726 0.2737863
#> stageStage III 0.37966981 0.7484015
tidy(fit_margstd3, bootverbose = TRUE) %>%
select(-statistic, -p.value) # allow the other columns to get printed instead
#> # A tibble: 3 × 10
#> term estimate std.error conf.low conf.high bootrepeats bootci jacksd.low
#> <chr> <dbl[1d> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 stageStag… 0 0 NA NA 1000 bca NA
#> 2 stageStag… 0.163 0.0587 0.0233 0.258 1000 bca 0.00942
#> 3 stageStag… 0.571 0.100 0.357 0.753 1000 bca 0.0146
#> # ℹ 2 more variables: jacksd.high <dbl>, model <chr>
```

`bootrepeats`

: Number of bootstrap repeats.`bootci`

: Type of bootstrap confidence interval (`bca`

,`normal`

, or`nonpar`

, see above).`jacksd.low`

and`jacksd.high`

: Jackknife-based Monte-Carlo standard errors for the lower and upper limits of the confidence interval. Available for`bca`

intervals only. If these simulation errors are large compared to the desired precision of the confidence limits, then the number of bootstrap repeats needs to be substantially increased.

The default are (currently) only 1000 bootstrap repeats to reduce
initial computation time. If the calculation of BC_{a}
confidence intervals fails with an error, and for final, precise
estimates, the number of repeats should be increased to >>1000.
Use the option `bootrepeats =`

in `summary()`

,
`tidy()`

, or `confint()`

.

For reproducibility, a seed should be set (e.g.,
`set.seed(123)`

).