Bayesian toolbox for quantitative proteomics. In particular, this package provides functions to generate synthetic datasets, execute Bayesian differential analysis methods, and display results.
You can install the released version of ProteoBayes from CRAN with:
install.packages("ProteoBayes")
You can install the development version of ProteoBayes like so:
# install.packages("devtools")
::install_github("mariechion/ProteoBayes") devtools
Generate a synthetic proteomic dataset with the correct format:
library(ProteoBayes)
set.seed(42)
## Generate a dataset without imputations
= simu_db()
data
## Generate a dataset with multiple imputations
= simu_db(multi_imp = TRUE)
data_imp
data#> # A tibble: 50 x 4
#> Peptide Group Sample Output
#> <chr> <int> <int> <dbl>
#> 1 Peptide_1 1 1 48.8
#> 2 Peptide_1 1 2 46.5
#> 3 Peptide_1 1 3 49.8
#> 4 Peptide_1 1 4 49.9
#> 5 Peptide_1 1 5 47.4
#> 6 Peptide_1 2 1 54.9
#> 7 Peptide_1 2 2 49.4
#> 8 Peptide_1 2 3 52.0
#> 9 Peptide_1 2 4 49.6
#> 10 Peptide_1 2 5 55.0
#> # i 40 more rows
data_imp#> # A tibble: 250 x 5
#> Peptide Group Sample Output Draw
#> <chr> <int> <int> <dbl> <int>
#> 1 Peptide_1 1 1 51.8 1
#> 2 Peptide_1 1 1 51.3 2
#> 3 Peptide_1 1 1 51.6 3
#> 4 Peptide_1 1 1 51.9 4
#> 5 Peptide_1 1 1 51.8 5
#> 6 Peptide_1 1 2 52.5 1
#> 7 Peptide_1 1 2 52.6 2
#> 8 Peptide_1 1 2 52.0 3
#> 9 Peptide_1 1 2 52.0 4
#> 10 Peptide_1 1 2 50.8 5
#> # i 240 more rows
The parameters of all posterior distributions can be computed thanks to:
= posterior_mean(data)
post
post#> # A tibble: 10 x 6
#> Peptide Group mu lambda alpha beta
#> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Peptide_1 1 48.8 6 3.5 6.81
#> 2 Peptide_1 2 51.9 6 3.5 17.2
#> 3 Peptide_2 1 51.5 6 3.5 3.54
#> 4 Peptide_2 2 53.5 6 3.5 10.6
#> 5 Peptide_3 1 18.6 6 3.5 6.61
#> 6 Peptide_3 2 20.9 6 3.5 11.8
#> 7 Peptide_4 1 44.0 6 3.5 7.00
#> 8 Peptide_4 2 46.0 6 3.5 17.8
#> 9 Peptide_5 1 34.0 6 3.5 30.7
#> 10 Peptide_5 2 37.7 6 3.5 5.40
When considering Peptides independently (no inter-Peptides correlations), the method is unaffected by missing values and no imputation is necessary.
We can also consider correlations between Peptides by using a multivariate version of the method. In this context, missing values need to be imputed beforehand (multiple imputations is accepted), and the dedicated function is:
= multi_posterior_mean(data_imp)
post_imp
post_imp#> # A tibble: 250 x 8
#> Draw Group Peptide Peptide2 mu lambda nu Sigma
#> <int> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 Peptide_1 Peptide_1 50.4 6 15 57.4
#> 2 1 1 Peptide_1 Peptide_2 50.4 6 15 -1.25
#> 3 1 1 Peptide_1 Peptide_3 50.4 6 15 11.5
#> 4 1 1 Peptide_1 Peptide_4 50.4 6 15 -8.05
#> 5 1 1 Peptide_1 Peptide_5 50.4 6 15 -3.92
#> 6 1 1 Peptide_2 Peptide_1 41.7 6 15 -1.25
#> 7 1 1 Peptide_2 Peptide_2 41.7 6 15 6.08
#> 8 1 1 Peptide_2 Peptide_3 41.7 6 15 2.85
#> 9 1 1 Peptide_2 Peptide_4 41.7 6 15 -0.206
#> 10 1 1 Peptide_2 Peptide_5 41.7 6 15 -2.40
#> # i 240 more rows
Once parameters of the posterior distributions are available. We can sample from those distributions and visualise the differences between any groups for each peptide by using:
= sample_distrib(post)
sample
plot_distrib(sample, group1 = 1, group2 = 2, peptide = 'Peptide_1' )
When comparing a large number of Peptides simultaneously, the
function identify_diff()
can be used to check for probable
differences between groups for each Peptide:
identify_diff(post)
#> # A tibble: 10 x 10
#> Peptide Group mu CI_inf CI_sup Group2 mu2 CI_inf2 CI_sup2 Distinct
#> <chr> <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <lgl>
#> 1 Peptide_1 1 48.8 47.5 50.2 2 51.9 49.7 54.0 FALSE
#> 2 Peptide_1 2 51.9 49.7 54.0 1 48.8 47.5 50.2 FALSE
#> 3 Peptide_2 1 51.5 50.6 52.5 2 53.5 51.9 55.2 FALSE
#> 4 Peptide_2 2 53.5 51.9 55.2 1 51.5 50.6 52.5 FALSE
#> 5 Peptide_3 1 18.6 17.3 20.0 2 20.9 19.1 22.7 FALSE
#> 6 Peptide_3 2 20.9 19.1 22.7 1 18.6 17.3 20.0 FALSE
#> 7 Peptide_4 1 44.0 42.6 45.4 2 46.0 43.8 48.2 FALSE
#> 8 Peptide_4 2 46.0 43.8 48.2 1 44.0 42.6 45.4 FALSE
#> 9 Peptide_5 1 34.0 31.1 36.8 2 37.7 36.5 38.9 FALSE
#> 10 Peptide_5 2 37.7 36.5 38.9 1 34.0 31.1 36.8 FALSE