This vignette demonstrates how to compute diagFDR diagnostics from mokapot PSM exports.
A key choice is to compute diagnostics on competed
winners: one PSM per run + spectrum_id, selecting
the maximum mokapot score among candidates (targets and decoys). This
corresponds to the standard target–decoy competition setting
for PSM-level inference.
The workflow is:
dfdr_run_all() and inspect tables/plots.We start with a small simulated dataset that is similar to a
competed-winner mokapot output. Any workflow producing a table that can
be mapped to columns id, is_decoy,
q, pep, run, and
score can similarly be used.
library(diagFDR)
set.seed(2)
n <- 6000
toy <- data.frame(
id = paste0("run1||scan", seq_len(n)),
is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
q = pmin(1, runif(n)^3), # many small q-values
pep = pmin(1, pmax(0, runif(n)^3 + rnorm(n, sd = 0.02))), # correlated toy PEP
run = "run1",
score = rnorm(n)
)
x <- as_dfdr_tbl(
toy,
unit = "psm",
scope = "global",
q_source = "toy mokapot",
q_max_export = 1
)
diag <- dfdr_run_all(
xs = list(mokapot = x),
alpha_main = 0.01,
alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
low_conf = c(0.2, 0.5)
)diag$tables$headline
#> # A tibble: 1 × 24
#> alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.01 1248 33 0.0264 0.174 0.0256 0.0272 0.0184 0.0345
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> # FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, IPE <dbl>,
#> # effect_abs <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> # flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> # interpretation <chr>diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#> qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.2 0.5 1307 38 0.0291 0.471 0.0213 0.0397
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__mokapotdiag$tables$pep_IPE
#> # A tibble: 1 × 2
#> list IPE
#> <chr> <dbl>
#> 1 mokapot 0.0739
diag$plots$pep_reliability__mokapot# sumpep is a list-of-tibbles (one per list)
diag$tables$sumpep$mokapot
#> # A tibble: 9 × 4
#> alpha n_targets sum_PEP mean_PEP
#> <dbl> <int> <dbl> <dbl>
#> 1 0.0005 443 107. 0.241
#> 2 0.001 559 141. 0.253
#> 3 0.002 721 177. 0.246
#> 4 0.005 985 239. 0.243
#> 5 0.01 1248 304. 0.244
#> 6 0.02 1584 394. 0.249
#> 7 0.05 2155 527. 0.244
#> 8 0.1 2703 654. 0.242
#> 9 0.2 3390 822. 0.242The code below shows how to run diagFDR directly from mokapot outputs.
library(diagFDR)
# Read mokapot outputs (targets + decoys)
raw <- read_mokapot_psms(
target_path = "path/to/your_output.mokapot.psms.txt",
decoy_path = "path/to/your_output.mokapot.decoy.psms.txt"
)
# Construct competed winners (1 per run+spectrum_id; max mokapot score)
x <- mokapot_competed_universe(
raw,
id_mode = "runxid",
unit = "psm",
scope = "global",
q_source = "mokapot q-value",
q_max_export = 1
)
# Run diagnostics
diag <- dfdr_run_all(
xs = list(mokapot = x),
alpha_main = 0.01,
alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
low_conf = c(0.2, 0.5)
)
# Write outputs to disk (tables + plots + summary + manifest + README)
dfdr_write_report(
diag,
out_dir = "diagFDR_mokapot_out",
formats = c("csv", "png", "manifest", "readme", "summary")
)
# Render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_mokapot_out")D_alpha (tail support) and D_alpha_win
(local boundary support) quantify how well the cutoff is supported by
decoys. Very strict thresholds can yield sparse decoy tails and thus
unstable operating points (large CV_hat).
S_alpha(eps) (elasticity) captures list sensitivity
to small threshold perturbations. Low Jaccard overlap indicates that
accepted IDs are sensitive to the chosen cutoff.
Equal-chance diagnostics (decoy fraction in low-confidence q-bands) and PEP-based diagnostics (PEP reliability, ΣPEP) are internal consistency checks under target–decoy assumptions. They are informative but do not replace external validation strategies (e.g., entrapment) when decoy representativeness is uncertain.