Confidence Intervals of \(f_2\) Using Bootstrap Method

Zhengguo XU



To compare the dissolution profiles, the most widely used method is the similarity factor \(f_2\), which is usually defined in the regulatory guideline as \[f_2 = 50 \log\frac{100}{\sqrt{1 + \frac{\sum_{t=1}^{t=n}\left(\bar{R}_t-\bar{T}_t\right)^2}{n}}},\] where \(\bar{R}_t\) and \(\bar{T}_t\) are mean dissolution profile of reference ad test product at time \(t\); \(n\) is the number of time points.

Nevertheless, there are several prerequisites for the use of f2 method according to the regulatory guidelines, such as no more than one time point above 85% dissolved should be used and the variability, expressed as coefficient of variation (CV), should be no more than 20% and 10% for early and later time points, respectively. See vignette Introduction to bootf2 for details.

Recently, several guidelines recommended the use of confidence interval of \(f_2\) approach using bootstrap when the traditional \(f_2\) method cannot be applied14. However, none of the guidelines specified in details which estimators or types of confidence intervals should be used. Therefore, the function bootf2() will output various confidence intervals by default for several \(f_2\) estimators.

Types of \(f_2\)

According to Shah et al, the population \(f_2\) for the inference is \[ f_2 = 100-25\log\left(1 + \frac{1}{P}\sum_{i=1}^P \left(\mu_{\mathrm{T},i} - \mu_{\mathrm{R},i}\right)^2 \right)\, , \] where \(P\) is the number of time points; \(\mu_{\mathrm{T},i}\) and \(\mu_{\mathrm{R},i}\), typically unknown, are population mean of test and reference product at time point \(i\), respectively5. Several estimators for the population \(f_2\) has been described in the literature58. The function is able to calculate the following five estimators for \(f_2\), along with their corresponding 90% confidence intervals using bootstrap.

The estimated \(f_2\) (\(\hat{f}_2\))

The estimated \(f_2\) (\(\hat{f}_2\), denoted by est.f2 in the function) is the one written in various regulatory guidelines. It is expressed differently, but mathematically equivalently, as \[ \hat{f}_2 = 100-25\log\left(1 + \frac{1}{P}\sum_{i=1}^P\left( \bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 \right)\:, \] where \(P\) is the number of time points; \(\bar{X}_{\mathrm{T},i}\) and \(\bar{X}_{\mathrm{R},i}\) are mean dissolution data at the \(i\)th time point of random samples chosen from the test and the reference population, respectively. Compared to the equation of population \(f_2\) above, the only difference is that in the equation of \(\hat{f}_2\) the sample means of dissolution profiles replace the population means for the approximation. In other words, a point estimate is used for the statistical inference in practice.

The Bias-corrected \(f_2\) (\(\hat{f}_{2,\mathrm{bc}}\))

Bias-corrected \(f_2\) (\(\hat{f}_{2,\mathrm{bc}}\), denoted by bc.f2 in the function) was described in the article of Shah et al5, as \[ \hat{f}_{2,\mathrm{bc}} = 100-25\log\left(1 + \frac{1}{P} \left(\sum_{i=1}^P\left(\bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 - \frac{1}{n}\sum_{i=1}^P \left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\right)\right)\,, \] where \(S_{\mathrm{T},i}^2\) and \(S_{\mathrm{R},i}^2\) are unbiased estimates of variance at the \(i\)th time points of random samples chosen from test and reference population, respectively; and \(n\) is the sample size. \(\bar{X}_{\mathrm{T},i}\), \(\bar{X}_{\mathrm{R},i}\) and \(P\) are same as described previously. As domain of the \(\log(\cdot)\) function needs to be positive, when variance is sufficiently high, i.e., when inequality \[\frac{1}{n}\sum_{i=1}^P\left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\ge \sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i}-\bar{X}_{\mathrm{R},i}\right)^2+P\] is valid, \(\hat{f}_{2,\mathrm{bc}}\) cannot be calculated.

The variance- and bias-corrected \(f_2\) (\(\hat{f}_{2, \mathrm{vcbc}}\))

Bias-corrected \(f_2\) described earlier assumes equal weight of variance between test and reference, which might not necessarily be true, therefore, the function also includes the following variance- and bias-corrected \(f_2\) (\(\hat{f}_{2, \mathrm{vcbc}}\), denoted by vc.bc.f2 in the function) \[ \hat{f}_{2, \mathrm{vcbc}} = 100-25\log\left(1 + \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 - \frac{1}{n}\sum_{i=1}^P\left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 + w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right)\right)\right)\,, \] where \(w_{\mathrm{T},i}\) and \(w_{\mathrm{R},i}\) are weighting factors for variance of test and reference products, respectively, which can be calculated as follows: \[ w_{\mathrm{T},i} = 0.5 + \frac{S_{\mathrm{T},i}^2} {S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2}\,, \] and \[ w_{\mathrm{R},i} = 0.5 + \frac{S_{\mathrm{R},i}^2} {S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2}\,. \] Similar to \(\hat{f}_{2, \mathrm{bc}}\), the \(\hat{f}_{2, \mathrm{vcbc}}\) cannot be estimated when the variance is sufficiently high, i.e., when the inequality \[ \frac{1}{n}\sum_{i=1}^P \left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 + w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right) \ge \sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i}-\bar{X}_{\mathrm{R},i}\right)^2+P \] is valid.

The expected \(f_2\) (\(\hat{f}_{2, \mathrm{exp}}\))

The mathematical expectation of \(\hat{f}_2\) is \[\begin{align}\label{eq:mathexp} \mathbb{E}\left(\hat{f}_2\right) &= \mathbb{E}\left(100 - 25\log\left(1 + \frac{1}{P} \sum_{i=1}^P\left(\bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 \right)\right)\notag\\ &\approx 100 - 25\log\left(1 + \mathbb{E}\left(\frac{1}{P} \sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2\right)\right) \notag\\ &= 100 - 25\log\left(1 + \frac{1}{P} \left(\sum_{i=1}^P\left(\mu_{\mathrm{T},i} - \mu_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P \left(\sigma_{\mathrm{T},i}^2 + \sigma_{\mathrm{R},i}^2\right)\right)\right), \end{align}\] where \(\sigma_{\mathrm{T},i}^2\) and \(\sigma_{\mathrm{R},i}^2\) are population variance of the test and the reference product, respectively.

The expected \(f_2\) (\(\hat{f}_{2, \mathrm{exp}}\), denoted by exp.f2 in the function) is calculated based on the mathematical expectation of \(\hat{f}_2\), using mean dissolution profiles and variance from samples for the approximation68. \[ \hat{f}_{2, \mathrm{exp}} = 100-25\log\left(1 + \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P\left(S_{\mathrm{T},i}^2 + S_{\mathrm{R},i}^2\right)\right)\right)\,. \]

The variance-corrected expected \(f_2\) (\(\hat{f}_{2, \mathrm{vcexp}}\)))

Similarly, since \(\hat{f}_{2, \mathrm{exp}}\) assumes equal weight of variance between test and reference, the variance-corrected version (\(\hat{f}_{2, \mathrm{vcexp}}\), denoted by vc.exp.f2 in the function) was also included in the function. \[ \hat{f}_{2, \mathrm{vcexp}} = 100-25\log\left(1 + \frac{1}{P}\left(\sum_{i=1}^P \left(\bar{X}_{\mathrm{T},i} - \bar{X}_{\mathrm{R},i}\right)^2 + \frac{1}{n}\sum_{i=1}^P\left(w_{\mathrm{T},i}\cdot S_{\mathrm{T},i}^2 + w_{\mathrm{R},i}\cdot S_{\mathrm{R},i}^2\right)\right)\right)\,. \]

Types of confidence intervals

The following 90% confidence intervals are included in the function:

The Normal interval

The Normal interval (denoted by normal in the function) with bias correction was estimated according to Davison and Hinkley9, \[\begin{align} \hat{f}_{2, \mathrm{L}} &= \hat{f}_{2, \mathrm{S}} - E_B - \sqrt{V_B}\cdot Z_{1-\alpha}\,, \\ \hat{f}_{2, \mathrm{U}} &= \hat{f}_{2, \mathrm{S}} - E_B + \sqrt{V_B}\cdot Z_{1-\alpha}\,, \end{align}\] where \(\hat{f}_{2, \mathrm{L}}\) and \(\hat{f}_{2, \mathrm{U}}\) are the lower and upper limit of the confidence interval estimated from bootstrap samples; \(\hat{f}_{2, \mathrm{S}}\) denotes the estimators as described in Section Types of \(f_2\) and calculated using the randomly selected samples from populations; \(Z_{1-\alpha}\) is \(\Phi^{-1}(1-\alpha)\), where \(\Phi(\cdot)\) represents standard normal cumulative distribution function and \(\Phi^{-1}(\cdot)\) denotes its inverse, the quantile function; \(\alpha\) is the type I error rate and equal to 0.05 in the current study; \(E_B\) and \(V_B\) are the resampling estimates of bias and variance calculated as \[\begin{align} E_B &= \frac{1}{B}\sum_{b=1}^{B}\hat{f}_{2,b}^\star - \hat{f}_{2, \mathrm{S}} = \bar{f}_2^\star - \hat{f}_{2, \mathrm{S}}\label{eq:eb}\,, \\ V_B &= \frac{1}{B-1}\sum_{b=1}^{B} \left(\hat{f}_{2,b}^\star-\bar{f}_2^\star\right)^2\label{eq:vb}\,, \end{align}\] where superscript \(^\star\) denotes that the \(f_2\) estimates are calculated from bootstrap samples; \(B\) is the number of bootstrap samples, which is equal to 10000 in the function as default value; \(\hat{f}_{2,b}^\star\) is the \(f_2\) estimate with the \(b\)th bootstrap sample and \(\bar{f}_2^\star\) is the mean value.

The basic interval

The basic interval (denoted by basic in the function) was calculated according to Davison and Hinkley9, \[\begin{align} \hat{f}_{2, \mathrm{L}} &= 2\hat{f}_{2, \mathrm{S}} - \hat{f}_{2,(B+1)(1-\alpha)}^\star\,, \\ \hat{f}_{2, \mathrm{U}} &= 2\hat{f}_{2, \mathrm{S}} - \hat{f}_{2,(B+1)\alpha}^\star\,, \end{align}\] where \(\hat{f}_{2,(B+1)\alpha}^\star\) and \(\hat{f}_{2,(B+1)(1-\alpha)}^\star\) are the \((B+1)\alpha\)th and the \((B+1)(1-\alpha)\)th ordered resampling estimates of \(f_2\), respectively. When \((B+1)\alpha\) is not an integer, the following equation is used for interpolation, \[ \hat{f}_{2,(B+1)\alpha}^\star = \hat{f}_{2,k}^\star + \frac{\Phi^{-1}\left(\alpha\right)-\Phi^{-1}\left(\frac{k}{B+1}\right)} {\Phi^{-1}\left(\frac{k+1}{B+1}\right)-\Phi^{-1}\left(\frac{k}{B+1}\right)} \left(\hat{f}_{2,k+1}^\star-\hat{f}_{2,k}^\star\right), \] where \(k = \left[(B+1)\alpha\right]\), the integer part of \((B+1)\alpha\), \(\hat{f}_{2,k+1}^\star\) and \(\hat{f}_{2,k}^\star\) are the \((k+1)\)th and the \(k\)th ordered resampling estimates of \(f_2\), respectively.

The percentile interval

The percentile intervals (denoted by percentile in the function) were estimated using nine different types of quantiles, percentile Type 1 to Type 9, as summarized in Hyndman and Fan’s article10 and implemented in R’s quantile function. Using R’s boot package, program bootf2BCA outputs a percentile interval using the equation above for interpolation. To be able to compare the results among different programs, the same interval, denoted by Percentile (boot) in the function, is also calculated in this study.

The BCa intervals

The bias-corrected and accelerated intervals were estimated as follows according to literature11, \[\begin{align} \hat{f}_{2, \mathrm{L}} &= \hat{f}_{2, \alpha_1}^\star\,,\\ \hat{f}_{2, \mathrm{U}} &= \hat{f}_{2, \alpha_2}^\star\,, \end{align}\] where \(\hat{f}_{2, \alpha_1}^\star\) and \(\hat{f}_{2, \alpha_2}^\star\) are the \(100\alpha_1\)th and the \(100\alpha_2\)th percentile of the resampling estimates of \(f_2\), respectively. Type I errors \(\alpha_1\) and \(\alpha_2\) were obtained as \[\begin{align} \alpha_1 &= \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + \hat{z}_\alpha} {1-\hat{a}\left(\hat{z}_0 + \hat{z}_\alpha\right)}\right), \\ \alpha_2 &= \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + \hat{z}_{1-\alpha}} {1-\hat{a}\left(\hat{z}_0 + \hat{z}_{1-\alpha}\right)}\right), \end{align}\] where \(\hat{z}_0\) and \(\hat{a}\) are called bias-correction and acceleration, respectively.

There are different methods to estimate the \(\hat{z}_0\) and \(\hat{a}\). Shah et al.5 used jackknife technique (denoted by bca.jackknife) as \[\begin{equation}\label{eq:z0} \hat{z}_0 = \Phi^{-1}\left(\frac{\#\left\{\hat{f}_{2,b}^\star < \hat{f}_{2,\mathrm{S}} \right\}}{B}\right), \end{equation}\] and \[\begin{equation}\label{eq:a0} \hat{a} = \frac{\sum_{i=1}^{n}\left(\hat{f}_{2,\mathrm{m}} - \hat{f}_{2, i}\right)^3}{6\left(\sum_{i=1}^{n} \left(\hat{f}_{2,\mathrm{m}} - \hat{f}_{2, i}\right)^2\right)^{3/2}}\,, \end{equation}\] where \(\#\left\{\cdot\right\}\) denotes the number of element in the set, \(\hat{f}_{2, i}\) is the \(i\)th jackknife statistic, \(\hat{f}_{2,\mathrm{m}}\) is the mean of the jackknife statistics, and \(n\) is the sample size.

Program bootf2BCA gives a slightly different BCa interval with R’s boot package8. This approach, denoted by bca.boot in the function, was also implemented in the function bootf2 for estimating the interval.


The complete list of arguments of the function is as follows:

bootf2(test, ref,,, path.out, file.out, = 10000L, seed = 306, digits = 2L, alpha = 0.05,
       regulation = c("EMA", "FDA", "WHO"), min.points = 1L,
       both.TR.85 = FALSE, = TRUE, = c("concise", "intermediate", "detailed"),
       f2.type = c("all", "est.f2", "exp.f2", "bc.f2",
                   "vc.exp.f2", "vc.bc.f2"),
       ci.type = c("all", "normal", "basic", "percentile",
                   "bca.jackknife", "bca.boot"),
       quantile.type = c("all", 1:9, "boot"),
       jackknife.type = c("nt+nr", "nt*nr", "nt=nr"),
       time.unit = c("min", "h"), = FALSE, = FALSE)

Notes on function arguments

  1. Data input: test, ref,,
    • In the typical interactive use, test and ref are data frames with the time as the first column, and individual units for the rest of columns. In such cases, arguments and should not be used.
    • Data can be read directly from an Excel file with extension .xlsx or .xls. In this case, data of test and reference should be stored in separate worksheets. The first column should be time, the rest columns are individual units. The first row is the column head indicating the names, such as ‘time,’ ‘unit 01,’ unit 02’, …. It doesn’t matter what the names are as columns will be renamed internally by the function. The important point is that the first row will not be read, so do not put dissolution data on the first row.
    • When and are provided, the argument test and ref should be the worksheet names inside quotation mark, e.g., "lot ABCD1234 pH 6.8".
    • can be an absolute path such as"/home/myname/my.project/dat/", or a relative path such as "../dat/" if the working directory is in the folder "/home/myname/my.project/analysis/" and the data file is in the folder "/home/myname/my.project/dat/".
    • One more note for Windows user: As Windows use “" instead of”/" to separate path, you have to either escape it by an additional "", e.g., "C:\user\myname\my.project\dat\" cannot be the, you have to changed it to "C:\\user\\myname\\my.project\\dat\\", or to "C:/user/myname/my.project/dat/", the same format as used in Linux system.
  2. Output: path.out and file.out,
    • If argument path.out and file.out are not provided, but argument is TRUE, then a plain text report will be generated automatically in the current working directory with file name test_vs_ref_TZ_YYYY-MM-DD_HHMMSS.txt, where test and ref are data set names of test and reference, TZ is the time zone such as CEST, YYYY-MM-DD is the numeric date format and HHMMSS is the numeric time format for hour, minute, and second.
    • For a quick check, set argument = TRUE, a summary report very similar to concise style report will be printed on screen.
  3. Argument jackknife.type
    • For any sample with size \(n\), the jackknife estimator is obtained by estimating the parameter for each subsample omitting the \(i\)th observation. However, when two samples (e.g., test and reference) are involved, there are several possible ways to do it. Assuming sample size of test and reference are \(n_\mathrm{T}\) and \(n_\mathrm{R}\), the following three possibility are considered:
      • Estimated by removing one observation from both test and reference samples. In this case, the prerequisite is \(n_\mathrm{T} = n_\mathrm{R}\), denoted by nt=nr in the function. So if there are 12 units in test and reference data sets, there will be 12 jackknife estimators.
      • Estimate the jackknife for test sample while keeping the reference data unchanged; and then estimate jackknife for reference sample while keeping the test sample unchanged. This is denoted by nt+nr in the function. This is the default method. So if there are 12 units in test and reference data sets, there will be \(12 + 12 = 24\) jackknife estimators.
      • For each observation deleted from test sample, estimate jackknife for reference sample. This is denoted by nt*nr in the function. So if there are 12 units in test and reference data sets, there will be \(12 \times 12 = 144\) jackknife estimators.
  4. Arguments f2.type and ci.type are explained in the previous section.
  5. By default, the individual bootstrapped data set are not included in the output as those data sets are only useful for validation purpose. To include the individual data sets, set the argument = TRUE.
  6. Read the function manual help("bootf2") for more details of each argument.


The minimum required arguments are test and ref. First, let’s simulate some dissolution profiles to play with.

# time points
tp <- c(5, 10, 15, 20, 30, 45, 60)

# model.par for reference with low variability
par.r <- list(fmax = 100, = 3, mdt = 15, = 14, 
                  tlag = 0, = 0, beta = 1.5, = 8)

# simulate reference data
dr <- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE)

# model.par for test 
par.t <- list(fmax = 100, = 3, mdt = 12.29, = 12,
                  tlag = 0, = 0, beta = 1.727, = 9)

# simulate test data with low variability
dt <- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE)

Use default setting for most arguments. To reduce the compilation time, bootstrap number is set to 100 only.

# output file will be generated automatically
bootf2(dt$sim.disso, dr$sim.disso, = 100, = FALSE, = TRUE)
European Medicines Agency. Question and answer on the adequacy of the Mahalanobis distance to assess the comparability of drug dissolution profiles (accessed 2018 -12 -04).
Davit, B. M.; Stier, E.; Jiang, X.; Anand, O. Expectations of the US-FDA Regarding Dissolution Data in Generic Drug Regulatory Submissions. Biopharma Asia 2013, 2 (2).
Lum, S. Health Canada’s Current Practice and Challenges in the Evaluation of Dissolution Profile Comparisons in Support of Minor/Moderate Product Quality Changes: Case Studies. In In vitro dissolution profiles similarity assessment in support of drug product quality: What, how, and when; 2019.
Mandula, H. Rational Statistical Analysis Practice in Dissolution Profile Comparison: FDA Perspective. In In vitro dissolution profiles similarity assessment in support of drug product quality: What, how, and when; 2019.
Shah, V. P.; Tsong, Y.; Sathe, P.; Liu, J.-P. In Vitro Dissolution Profile Comparison—Statistics and Analysis of the Similarity Factor, F2. Pharmaceutical Research 1998, 15 (6), 889–896.
Ma, M.-C.; Wang, B. B. C.; Liu, J.-P.; Tsong, Y. Assessment of Similarity Between Dissolution Profiles. Journal of Biopharmaceutical Statistics 2000, 10 (2), 229–249.
Mendyk, A.; Pacławski, A.; Szlek, J.; Jachowicz, R. PhEqbootstrap: Open-Source Software for the Simulation of \(f_2\) Distribution in Cases of Large Variability in Dissolution Profiles. Dissolution Technologies 2013, 20 (1), 13–17.
Mendyk, A. bootf2BCA (accessed 2021 -02 -03).
Davison, A. C.; Hinkley, D. V. Bootstrap Methods and Their Application; Cambridge University Press, 1997.
Hyndman, R. J.; Fan, Y. Sample Quantiles in Statistical Packages. The American Statistician 1996, 50 (4), 361–365.
Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman & Hall, 1993.