diagFDR: mokapot diagnostics (competed winners)

This vignette demonstrates how to compute diagFDR diagnostics from mokapot PSM exports.

A key choice is to compute diagnostics on competed winners: one PSM per run + spectrum_id, selecting the maximum mokapot score among candidates (targets and decoys). This corresponds to the standard target–decoy competition setting for PSM-level inference.

Runnable toy example (no external files required)

We start with a small simulated dataset that is similar to a competed-winner mokapot output. Any workflow producing a table that can be mapped to columns id, is_decoy, q, pep, run, and score can similarly be used.

library(diagFDR)

set.seed(2)

n <- 6000
toy <- data.frame(
  id = paste0("run1||scan", seq_len(n)),
  is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
  q = pmin(1, runif(n)^3),                                  # many small q-values
  pep = pmin(1, pmax(0, runif(n)^3 + rnorm(n, sd = 0.02))), # correlated toy PEP
  run = "run1",
  score = rnorm(n)
)

x <- as_dfdr_tbl(
  toy,
  unit = "psm",
  scope = "global",
  q_source = "toy mokapot",
  q_max_export = 1
)

diag <- dfdr_run_all(
  xs = list(mokapot = x),
  alpha_main = 0.01,
  alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  low_conf = c(0.2, 0.5)
)

Headline stability at 1%

diag$tables$headline
#> # A tibble: 1 × 24
#>   alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#>   <dbl>   <int>   <int>   <dbl>  <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
#> 1  0.01    1248      33  0.0264  0.174     0.0256    0.0272     0.0184    0.0345
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> #   FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, IPE <dbl>,
#> #   effect_abs <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> #   flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> #   interpretation <chr>

Decoy tail support and stability proxy

diag$plots$dalpha

diag$plots$cv

Local boundary support and elasticity

diag$plots$dwin

diag$plots$elasticity

Equal-chance plausibility by q-band (internal check)

diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#>   qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#>         <dbl>  <dbl>  <dbl>  <int>    <int>    <dbl>      <dbl>   <dbl>   <dbl>
#> 1           1    0.2    0.5   1307       38   0.0291      0.471  0.0213  0.0397
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__mokapot

PEP reliability and expected errors (ΣPEP)

diag$tables$pep_IPE
#> # A tibble: 1 × 2
#>   list       IPE
#>   <chr>    <dbl>
#> 1 mokapot 0.0739
diag$plots$pep_reliability__mokapot

# sumpep is a list-of-tibbles (one per list)
diag$tables$sumpep$mokapot
#> # A tibble: 9 × 4
#>    alpha n_targets sum_PEP mean_PEP
#>    <dbl>     <int>   <dbl>    <dbl>
#> 1 0.0005       443    107.    0.241
#> 2 0.001        559    141.    0.253
#> 3 0.002        721    177.    0.246
#> 4 0.005        985    239.    0.243
#> 5 0.01        1248    304.    0.244
#> 6 0.02        1584    394.    0.249
#> 7 0.05        2155    527.    0.244
#> 8 0.1         2703    654.    0.242
#> 9 0.2         3390    822.    0.242

Real mokapot workflow (targets + decoys text exports)

The code below shows how to run diagFDR directly from mokapot outputs.

library(diagFDR)

# Read mokapot outputs (targets + decoys)
raw <- read_mokapot_psms(
  target_path = "path/to/your_output.mokapot.psms.txt",
  decoy_path  = "path/to/your_output.mokapot.decoy.psms.txt"
)

# Construct competed winners (1 per run+spectrum_id; max mokapot score)
x <- mokapot_competed_universe(
  raw,
  id_mode = "runxid",        
  unit = "psm",
  scope = "global",
  q_source = "mokapot q-value",
  q_max_export = 1
)

# Run diagnostics
diag <- dfdr_run_all(
  xs = list(mokapot = x),
  alpha_main = 0.01,
  alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  low_conf = c(0.2, 0.5)
)

# Write outputs to disk (tables + plots + summary + manifest + README)
dfdr_write_report(
  diag,
  out_dir = "diagFDR_mokapot_out",
  formats = c("csv", "png", "manifest", "readme", "summary")
)

# Render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_mokapot_out")

Interpretation notes

D_alpha (tail support) and D_alpha_win (local boundary support) quantify how well the cutoff is supported by decoys. Very strict thresholds can yield sparse decoy tails and thus unstable operating points (large CV_hat).
S_alpha(eps) (elasticity) captures list sensitivity to small threshold perturbations. Low Jaccard overlap indicates that accepted IDs are sensitive to the chosen cutoff.
Equal-chance diagnostics (decoy fraction in low-confidence q-bands) and PEP-based diagnostics (PEP reliability, ΣPEP) are internal consistency checks under target–decoy assumptions. They are informative but do not replace external validation strategies (e.g., entrapment) when decoy representativeness is uncertain.