Coding-variant Allelic Series Test

Updated: 2024-11-20

Data

To run an allelic series test, there are 4 key inputs:

The example data used below were generated using the DGP function provided with the package. The data set includes 100 subjects, 300 variants, and a continuous phenotype. The true effect sizes follow an allelic series, with magnitudes proportional to c(1, 2, 3) for BMVs, DMVs, and PTVs respectively.

set.seed(101)
n <- 100
data <- AllelicSeries::DGP(
  n = n,
  snps = 300,
  beta = c(1, 2, 3) / sqrt(n),
)

# Annotations.
anno <- data$anno
head(anno)
## [1] 1 1 2 2 1 2
# Covariates.
covar <- data$covar
head(covar)
##      int         age sex          pc1        pc2         pc3
## [1,]   1 -0.77465210   1  0.539531581  1.2134116  0.63478280
## [2,]   1 -0.03492052   0  0.009270313  1.5256400 -0.62027221
## [3,]   1 -0.20782643   1 -0.777242512  1.2319750 -0.73311857
## [4,]   1  0.69939529   0 -0.357743663  0.1747664 -0.29754242
## [5,]   1  0.47420055   1 -1.083115262 -1.3005348 -0.68338808
## [6,]   1 -1.15038152   0 -0.539141451 -0.5159589  0.01971706
# Genotypes.
geno <- data$geno
head(geno[,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0
## [6,]    0    1    0    0    0
# Phenotype.
pheno <- data$pheno
head(pheno)
## [1] 0.2958143 1.7294107 0.4886318 1.0201730 1.0304183 1.9704552

The example data generated by the preceding are available under vignettes/vignette_data.

Running the alleic series test

The COding-variant Allelic Series Test (COAST) is run using the COAST function. By default, the output of COAST includes a data.frame of counts showing the number of alleles, variants, and carriers in each class that contributed to the test, and a data.frame of p-values, with the omni test denoting the final, overall p-value. Inspection of the component p-values may be useful for determining which model(s) detected evidence of an association. In the present case, the baseline count model provided the greatest power.

results <- AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar
)
show(results)
## Effect Sizes:
##         test beta    se
## 1       base 0.05 0.068
## 2       base 0.06 0.079
## 3       base 0.59 0.175
## 4        ind 0.31 0.256
## 5        ind 0.22 0.216
## 6        ind 0.66 0.218
## 7  max_count 0.06 0.048
## 8    max_ind 0.44 0.130
## 9  sum_count 0.07 0.031
## 10   sum_ind 0.19 0.058
## 
## 
## Counts:
##   anno alleles variants carriers
## 1    1     186      164       84
## 2    2     127      110       74
## 3    3      27       26       23
## 
## 
## P-values:
##           test   type     pval
## 1     baseline burden 6.62e-03
## 2          ind burden 9.11e-03
## 3    max_count burden 2.32e-01
## 4      max_ind burden 8.15e-04
## 5    sum_count burden 1.48e-02
## 6      sum_ind burden 1.09e-03
## 7 allelic_skat   skat 6.83e-02
## 8         omni   omni 4.68e-03

The effect sizes data.frame is accessed via:

results@Betas
##         test       beta         se
## 1       base 0.05309214 0.06836091
## 2       base 0.06427266 0.07917711
## 3       base 0.58612733 0.17472435
## 4        ind 0.30597919 0.25595872
## 5        ind 0.21943023 0.21607933
## 6        ind 0.66475646 0.21829831
## 7  max_count 0.05719340 0.04783470
## 8    max_ind 0.43655519 0.13041078
## 9  sum_count 0.07437087 0.03052869
## 10   sum_ind 0.19038906 0.05828998

the counts data.frame via:

results@Counts
##   anno alleles variants carriers
## 1    1     186      164       84
## 2    2     127      110       74
## 3    3      27       26       23

and the p-values data.frame via:

results@Pvals
##           test   type        pval
## 1     baseline burden 0.006618953
## 2          ind burden 0.009107121
## 3    max_count burden 0.231834498
## 4      max_ind burden 0.000815325
## 5    sum_count burden 0.014846685
## 6      sum_ind burden 0.001089858
## 7 allelic_skat   skat 0.068349669
## 8         omni   omni 0.004683243

Different numbers of annotation categories

COAST was originally intended to operate on the benign missense variants, damaging missense variants, and protein truncating variants within a gene, but it has been generalized to allow for an arbitrary number of discrete annotation categories. The following example simulates and analyzes data with 4 annotation categories. The main difference when analyzing a different number of annotation categories is that the weight vector should be specified, and should have length equal to the number of possible annotation categories. COAST will run, albeit with a warning, if there are possible annotation categories to which no variants are assigned (e.g. a gene contains no PTVs).

withr::local_seed(102)

# Generate data.
n <- 1e2
data <- AllelicSeries::DGP(
  n = n,
  snps = 400,
  beta = c(1, 2, 3, 4) / sqrt(n),
  prop_anno = c(0.4, 0.3, 0.2, 0.1),
  weights = c(1, 1, 1, 1)
)

# Run COAST-SS.
results <- AllelicSeries::COAST(
  anno = data$anno,
  covar = data$covar,
  geno = data$geno,
  pheno = data$pheno,
  weights = c(1, 2, 3, 4)
)
show(results)
## Effect Sizes:
##         test  beta    se
## 1       base  0.01 0.065
## 2       base  0.23 0.062
## 3       base  0.19 0.096
## 4       base  0.61 0.116
## 5        ind -0.10 0.253
## 6        ind  0.53 0.205
## 7        ind  0.31 0.169
## 8        ind  0.96 0.186
## 9  max_count  0.18 0.035
## 10   max_ind  0.47 0.090
## 11 sum_count  0.11 0.018
## 12   sum_ind  0.19 0.035
## 
## 
## Counts:
##   anno alleles variants carriers
## 1    1     194      169       86
## 2    2     159      131       78
## 3    3      71       61       51
## 4    4      41       39       31
## 
## 
## P-values:
##           test   type     pval
## 1     baseline burden 9.67e-10
## 2          ind burden 5.86e-07
## 3    max_count burden 4.26e-07
## 4      max_ind burden 1.65e-07
## 5    sum_count burden 5.37e-10
## 6      sum_ind burden 8.24e-08
## 7 allelic_skat   skat 3.27e-06
## 8         omni   omni 4.11e-09

Test options

AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  apply_int = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  include_orig_skato_all = TRUE,
  include_orig_skato_ptv = TRUE,
  ptv_anno = 3
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = 1 * (pheno > 0),
  covar = covar,
  is_pheno_binary = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  min_mac = 2
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  return_omni_only = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  score_test = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  weights = c(1, 2, 3)
)