The DBpower package implements power calculations for detection boundary tests including the Berk-Jones (BJ), Generalized Berk-Jones (GBJ), and innovated Berk-Jones (iBJ). These tests are commonly used to conduct set-based inference in genetics settings. Two primary use cases for this package are study design for genetic association studies and post-hoc power calculation for such studies.
More specifically, this package can help determine whether an innovated type test (like the iBJ) or generalized type test (like GBJ) will have more power for a given hypothesis testing situation. The relative operating characteristics of these tests are known to vary widely (see our submitted manuscript for details), and so the choice of test is very important, as we generally do not want to apply multiple tests for one set (due to an increased multiple testing burden).
Set-based association methods aggregate many individual hypothesis tests, usually under biologically interpretable groupings. These methods possess many natural advantages over individual tests, for example they can reduce the multiple testing burden, combine smaller effects into a more detectable signal, and provide more interpretable results. As a concrete example, in eQTL analysis, we can test if a set of genetic risk variants around a particular risk gene is associated with the expression values of that gene. Significant association provides evidence that the gene expression mediates the relationship between causal variants and disease. This association may not detectable when associating individual variants with gene expression.
We may also want to test the association between an individual variant and a group of risk gene expression values. Significant association provides evidence that the individual variant possess functional behavior related to regulating the expression values of risk genes. Thus the variant is a better candidate for translational follow-up compared to non-functional variants that may simply lie in linkage disequilibrium with the true causal variants.
Detection boundary tests are popular in these settings because they reach a so-called rare-weak detection boundary. In a certain sense, these tests are able to detect the sparsest and smallest signals detectable by any statistical test. Because effects in genetic association studies are often assumed to be sparse and weak, the detection boundary tests are a good choice to perform set-based inference.
As the detection boundary tests were initially developed for sets of independent elements, modifications are needed to apply them to correlated genetics settings. Two approaches are the innovated approach and the generalized approach. The innovated approach (e.g. iBJ) decorrelates the set of test statistics first before applying the standard detection boundary approach. The generalized approach (e.g. GBJ) modifies the detection boundary method to explicitly allow for correlated elements in a set. These tests demonstrate distinct finite sample power properties.
DBpower assumes that a set of test statistics generated under the alternative are multivariate normal with some nonzero mean and covariance matrix that can be estimated consistently. The package first calculates the rejection regions needed to perform power calculations. These rejection regions depend on the covariance matrix and the choice of test. Given the rejection region and the distribution of the test statistics under the alternative, the package then provides lower and upper bounds on the exact power of the test. Bounds are provided because the exact power of detection boundary tests is incredibly computationall