To run an allelic series test, there are 4 key inputs:
A numeric annotation vector, of the same length as the number of variants, coded as 0 for benign missense variants (BMVs), 1 for deleterious missense variants (DMVs), and 2 for protein truncating variants (PTVs).
A covariates matrix, with as many rows as subjects and including columns such as age and sex. If omitted, defaults to an intercept only.
A genotype matrix, with subjects as rows and variants as columns. The number of columns should correspond to the length of the annotation vector.
A numeric phenotype vector, either continuous or binary.
The example data used below were generated using the DGP
function provided with the package. The data set includes 100 subjects,
300 variants, and a continuous phenotype. The true effect sizes follow
an allelic series, with magnitudes proportional to
c(1, 2, 3)
for BMVs, DMVs, and PTVs respectively.
set.seed(101)
<- 100
n <- AllelicSeries::DGP(
data n = n,
snps = 300,
beta = c(1, 2, 3) / sqrt(n),
)
# Annotations.
<- data$anno
anno head(anno)
## [1] 2 0 0 0 0 1
# Covariates.
<- data$covar
covar head(covar)
## int age sex pc1 pc2 pc3
## [1,] 1 0.06340401 1 1.54356127 -2.3816890 0.3378446
## [2,] 1 -0.26084149 1 -0.34178454 -1.4227627 -0.2141389
## [3,] 1 0.10148711 1 1.25528541 0.4608104 -0.6585620
## [4,] 1 0.48607024 1 -0.87800487 0.4316516 0.3552285
## [5,] 1 -0.52207993 1 -0.03553949 -0.1124594 -0.4243847
## [6,] 1 -1.41018843 0 0.37024952 1.3373260 -0.4423871
# Genotypes.
<- data$geno
geno head(geno[,1:5])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 0 0 0
## [3,] 0 0 0 0 0
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
## [6,] 0 0 0 0 0
# Phenotype.
<- data$pheno
pheno head(pheno)
## [1] 3.1059934 1.1354490 0.4306503 0.7834604 1.0421679 -0.4964317
The example data generated by the preceding are available under
vignettes/vignette_data
.
The COding-variant Allelic Series Test (COAST) is run using the
COAST
function. By default, p-values for the component
tests, as well as the overall omnibus test (p_omni
), are
returned. Inspection of the component p-values is useful for determining
which model(s) drove an association. In the presence case, the
association was most evident via the baseline count model
(p_count
).
<- AllelicSeries::COAST(
results anno = anno,
geno = geno,
pheno = pheno,
covar = covar
)show(results)
## p_count p_ind p_max_count p_max_ind p_sum_count
## 3.112702e-26 1.322084e-09 3.076876e-10 5.374363e-09 1.661854e-20
## p_sum_ind p_allelic_skat p_omni
## 2.554417e-11 2.658137e-07 3.735235e-25
apply_int = TRUE
applies the rank-based inverse normal
transformation from the RNOmni package.
This transformation is expected to improve power for phenotypes that
have a skewed or kurtotic (e.g. long-tailed) distribution. It is applied
by default in the case of continuous phenotype, and is ignored in the
case of a binary phenotype.::COAST(
AllelicSeriesanno = anno,
geno = geno,
pheno = pheno,
covar = covar,
apply_int = TRUE
)
include_orig_skato_all = TRUE
includes standard SKAT-O
applied to all variants as a component of the omnibus test, while
include_orig_skato_ptv = TRUE
includes standard SKAT-O
applied to PTVs only. Including standard SKAT-O as a component of the
omnibus test can improve power to detect associations between the
phenotype and genes that may not be allelic series.::COAST(
AllelicSeriesanno = anno,
geno = geno,
pheno = pheno,
covar = covar,
include_orig_skato_all = TRUE,
include_orig_skato_ptv = TRUE,
)
is_pheno_binary = TRUE
is required to indicate that the
supplied phenotype is binary, and should be analyzed using a logistic
regression model.::COAST(
AllelicSeriesanno = anno,
geno = geno,
pheno = 1 * (pheno > 0),
covar = covar,
is_pheno_binary = TRUE
)
return_omni_only = TRUE
is used to return
p_omni
only when the component p-values are not of
interest:::COAST(
AllelicSeriesanno = anno,
geno = geno,
pheno = pheno,
covar = covar,
return_omni_only = TRUE
)
score_test = TRUE
specifies the use of a score-type
allelic series burden test. The default of
score_test = FALSE
specifies a Wald-type allelic series
burden test.::COAST(
AllelicSeriesanno = anno,
geno = geno,
pheno = pheno,
covar = covar,
score_test = TRUE
)
weights
specifies the relative phenotypic effects of
BMVs, DMVs, and PTVs. An increasing pattern such as the default setting
of weights = c(1, 2, 3)
targets allelic series. Setting
weights = c(1, 1, 1)
would target a genetic architecture
where all variants have equivalent expected magnitudes.::COAST(
AllelicSeriesanno = anno,
geno = geno,
pheno = pheno,
covar = covar,
weights = c(1, 2, 3)
)