---
title: "rtpcr: a package for statistical analysis of real-time PCR data in R"
author: Ghader Mirzaghaderi
output:
html_document:
toc: true
keep_md: true
output: rmarkdown::html_vignette
df_print: default
pdf_document:
toc: true
latex_engine: lualatex
word_document:
toc: No
vignette: |
%\VignetteIndexEntry{Sending Messages With Gmailr}
%\usepackage[utf8]{inputenc}
%\VignetteEngine{knitr::knitr}
editor_options:
markdown:
wrap: 90
chunk_output_type: console
---
```{r setup, include = FALSE, fig.align='center', warning = F, message=F}
options(tinytex.verbose = TRUE)
knitr::opts_chunk$set(echo = TRUE)
library(rtpcr)
```
# Overview
Quantitative real-time polymerase chain reaction (qRT-PCR or qPCR) is widely used in molecular biology research. Various analysis methods are employed to analyze qPCR data to measure the mRNA levels of a target gene under different experimental conditions.
‘rtpcr’ package was developed for amplification efficiency calculation, statistical analysis and bar plot representation of qPCR data in R. The rtpcr package was developed for amplification efficiency calculation, statistical analysis, and graphical display of qPCR data in R. The rtpcr package uses a general calculation methodology that accounts for up to two reference genes and amplification efficiency values, covering the Pfaffl method. Based on the experimental conditions, the functions of the rtpcr package use a t-test (for experiments with a two-level factor), analysis of variance (ANOVA), or analysis of covariance (ANCOVA) (where more than two levels or factors exists) to calculate the fold change (FC) or relative expression (RE) of a target gene. The functions also provide standard errors and confidence limits for the means and apply statistical mean comparisons. To facilitate function application, the rtpcr package includes different data sets as examples. The rtpcr package also provides ggplots with various editing arguments, allowing users to customize the graphical output.
A general calculation methodology described by Ganger et al. (2017)
and Taylor et al. (2019), matching both Livak and Schmittgen (2001) and Pfaffl et al. (2002) methods was implemented in the 'rtpcr' package. 'rtpcr' calculate the relative expression based on ${\Delta\Delta C_t}$ or ${\Delta C_t}$ method.
## 2.2 Data Structure for Non-Repeated Measures
For `ANOVA_DCt` and `ANOVA_DDCt`, each line should represent a separate individual, indicating a non-repeated measure experiment. The general column structure for these methods (assuming one reference gene) includes: condition columns, biological replicates, target gene efficiency ($E_{target}$), target Gene Ct ($Ct_{target}$), reference gene efficiency ($E_{ref}$), and reference gene Ct ($Ct_{ref}$). The package handles up to two reference genes.
## 2.3 Data Structure for Repeated Measures
The `REPEATED_DDCt` function is used for observations repeatedly taken over time or courses. The input data frame must be structured such that:
1. The **first column** is `id` (a unique number assigned to each individual sampled over time).
2. This is followed by factor columns (which must include the `time` course levels).
3. The remaining columns are efficiency and Ct values of target and reference genes.
## 2.4 Handling Blocking Factors
Variation between different qPCR plates can introduce noise; this can be accounted for using a blocking factor. By defining a `block` factor column, you ensure that at least one replicate of each condition is present on every plate. In the statistical models, the block effect is typically considered random, and its interaction with main effects is ignored.
# Data structure and column arrangement
To use the functions, input data should be prepared in the right format with appropriate column arrangement. The correct column arrangement is shown in Table 1 and Table 2. For `ANOVA_DDCt` or `ANOVA_DCt` analysis, ensure that each line in the data set belongs to a separate individual or biological replicate reflecting a non-repeated measure experiment.
*Table 1. Data structure and column arrangement required for ‘rtpcr’ package. rep: technical replicate; targetE and refE: amplification efficiency columns for target and reference genes, respectively. targetCt and refCt: target gene and reference gene Ct columns, respectively. factors (up to three factors is allowed): experimental factors.*
| Experiment type | Column arrangement of the input data | Example in the package |
|:---------------------|:---------------------------------------|:------------------------------------------|
|Amplification efficiency |Dilutions - geneCt ... | data_efficiency |
|t-test (accepts multiple genes) |condition (put the control level first) - gene (put reference gene(s) last.)- efficiency - Ct | data_ttest |
|Factorial (Up to three factors) |factor1 - rep - targetE - targetCt - refE - refCt | data_1factor |
| |factor1 - factor2 - rep - targetE - targetCt - refE - refCt | data_2factor |
| |factor1 - factor2 - factor3 - rep - targetE - targetCt - refE - refCt | data_3factor |
|Factorial with blocking |factor1 - block - rep - targetE - targetCt - refE - refCt | |
| |factor1 - factor2 - block - rep - targetE - targetCt - refE - refCt | data_2factorBlock |
| |factor1 - factor2 - factor3 - block - rep - targetE - targetCt - refE - refCt | |
|Two reference genes |. . . . . . rep - targetE - targetCt - ref1E - ref1Ct - ref2E - ref2Ct | |
|calculating biological replicated |. . . . . . biologicalRep - techcicalRep - Etarget - targetCt - Eref - refCt | data_withTechRep |
| |. . . . . . biologicalRep - techcicalRep - Etarget - targetCt - ref1E - ref1Ct - ref2E - ref2Ct | |
NOTE: For `ANOVA_DDCt` or `ANOVA_DCt` analysis, each line in the input data set belongs to a separate individual or biological replicate reflecting a non-repeated measure experiment.
*Table 2. Repeated measure data structure and column arrangement required for the `REPEATED_DDCt` function. targetE and refE: amplification efficiency columns for target and reference genes, respectively. targetCt and refCt: Ct columns for target and reference genes, respectively. In the "id" column, a unique number is assigned to each individual, e.g. all the three number 1 indicate a single individual.*
| Experiment type | Column arrangement of the input data | Example in the package |
|:---------------------|:----------------------------------------|:------------------------------------------|
|Repeated measure | id - time - targetE - targetCt - ref1E - ref1Ct | data_repeated_measure_1 |
| | id - time - targetE - targetCt - ref1E - ref1Ct - ref2E - ref2Ct | |
|Repeated measure | id - treatment - time - targetE - targetCt - ref1E - ref1Ct | data_repeated_measure_2 |
| | id - treatment - time - targetE - targetCt - ref1E - ref1Ct - ref2E - ref2Ct | |
To see list of data in the `rtpcr` package run `data(package = "rtpcr")`.
Example data sets can be presented by running the name of each data set. A description of the columns names in each data set is called by "?" followed by the names of the data set, for example `?data_1factor`
# 3. Amplification Efficiency
The `efficiency` function calculates the **amplification efficiency (E)**, slope, and $R^2$ statistics for genes based on a standard curve dilution series. It takes a data frame where the first column contains the dilutions.
```{r eval= T}
# Applying the efficiency function
efficiency(data_efficiency)
```
The function returns the calculated statistics, standard curve plots, and, if more than two genes are present, a slope comparison table.
# 4. Relative Expression Analysis
## 4.1 $\Delta Ct$ Method (`ANOVA_DCt`)
The `ANOVA_DCt` function performs relative expression analysis using reference gene(s) as a normalizer. It returns a list containing:
* `lm`: The linear model output including ANOVA tables.
* `Result`: A table showing treatments, RE, Lower and Upper Confidence Limits (LCL, UCL), and a letter display for pair-wise comparisons.
```{r eval= T, fig.cap = "relative expression (DDCt) of two different genes. RE tables of any number of genes can be combined and used as input data frame for `plotTwoFactor` function."}
# Example with three factors and one reference gene, without a blocking factor
ANOVA_DCt(data_3factor, numberOfrefGenes = 1, block = NULL)
# Example with a blocking factor
ANOVA_DCt(data_2factorBlock, block = "Block", numberOfrefGenes = 1)
```
## 4.2 $\Delta \Delta Ct$ Method
### 4.2.1 ANOVA using (`ANOVA_DDCt`)
The `ANOVA_DDCt` function is designed for $\Delta \Delta Ct$ analysis in uni- or multi-factorial experiments, using either ANOVA or ANCOVA.
**Required Arguments:** You must specify the `mainFactor.column` for which relative expression (RE) is calculated.
**Calibrator Selection:**
The **calibrator** (reference level) is the sample used for comparison, and its resulting RE value is 1. By default, the first level of the `mainFactor.column` is used as the calibrator. You can override this default using the `mainFactor.level.order` argument, where the first level listed in the vector will serve as the calibrator.
**Analysis Type:**
1. **ANOVA (`analysisType = "anova"`):** Performs analysis based on a full model factorial experiment.
2. **ANCOVA (`analysisType = "ancova"`):** The remaining factors (if any) are treated as covariate(s) in the analysis of variance.
**Important ANCOVA Consideration:** If the interaction between the main factor and the covariate is significant, ANCOVA is generally not appropriate. ANCOVA is primarily used when a factor is affected by uncontrolled quantitative covariate(s).
**Output:** The function returns both the `ANOVA_table` and `ANCOVA_table`, along with an `RE Table` providing relative expression values, log2 Fold Change (`log2FC`), significance, and confidence intervals. It also returns bar plots based on "RE" or "log2FC".
```{r eval= T}
# Example using two factors with a blocking factor and ANCOVA
ANOVA_DDCt(data_2factorBlock,
numberOfrefGenes = 1,
mainFactor.column = 1,
block = "block",
analysisType = "ancova")
```
### 4.2.2 Repeated Measure Analysis (`REPEATED_DDCt`)
The `REPEATED_DDCt` function is specialized for $\Delta \Delta C_T$ analysis of repeated measure data, where observations are taken over different time courses from the same individuals.
The analysis uses a mixed linear model where `id` (individual) is a random effect. You must specify the `factor` (e.g., "time") for which fold change (FC) values are analyzed.
```{r eval= T}
# Example for repeated measures data (time is the factor of interest)
REPEATED_DDCt(data_repeated_measure_1,
numberOfrefGenes = 1,
factor = "time", block = NULL)
```
### 4.2.3 T-Test Analysis (`TTEST_DDCt`)
The `TTEST_DDCt` function is used for $\Delta \Delta C_T$ expression analysis of target genes evaluated under two conditions (control vs. treatment).
**Data Requirements:** Input must be a data frame of 4 columns: Conditions, E (efficiency), Gene, and Ct values (mean of technical replicates). Biological replicates must be equal for all genes.
**Sample Types:**
* **Paired:** Samples are collected from one set of individuals at two different conditions (e.g., before and after treatment), reducing inter-individual variability.
* **Unpaired:** Two distinct sets of individuals are used (one untreated, one treated).
The function allows specification of `paired` (logical) and whether variances are equal (`var.equal`).
```{r eval= T}
# Example for unpaired t-test based analysis
TTEST_DDCt(data_ttest,
paired = FALSE,
var.equal = TRUE,
numberOfrefGenes = 1)
```
# 5. Visualization Functions
The `rtpcr` package includes dedicated plotting functions for visualizing the results returned in the statistics tables (e.g., `Result` or `RE Table`) from the analysis functions. These plots generally include bar height based on RE or log2FC, and error bars representing standard error (se) or confidence interval (ci), along with optional mean grouping letters derived from multiple comparisons.
* `plotOneFactor`: Generates a bar plot for single-factor experimental results. It requires column indices specifying the x-axis factor, y-axis value, lower error bar, upper error bar, and optionally, the grouping letters.
```{r eval= T}
# Before plotting, the result needs to be extracted as below:
res <- ANOVA_DCt(data_1factor, numberOfrefGenes = 1, block = NULL)$Result
# Bar plot
plotOneFactor(res, 1, 2, 7, 8, 11,
show.groupingLetters = TRUE)
```
* `plotTwoFactor`: Creates a bar plot for two-factorial experiment results, using one factor for the x-axis and the second factor for bar fill/grouping.
```{r eval= T}
a <- ANOVA_DCt(data_2factorBlock, block = "Block", numberOfrefGenes = 1)
data <- a$Results
plotTwoFactor(
data = data,
x_col = 2,
y_col = 3,
group_col = 1,
Lower.se_col = 8,
Upper.se_col = 9,
letters_col = 12,
show.groupingLetters = TRUE)
```
* `plotThreeFactor`: Generates a bar plot for three-factorial experiment results, typically utilizing facet grids to display the third factor.
The utility function `multiplot` is available to combine multiple `ggplot` objects (such as those generated by the `plotOneFactor`, `plotTwoFactor`, etc.) into a single plate, specifying the number of columns desired.
```{r eval= T, fig.height = 7, fig.width = 12.5, fig.align = 'center'}
res <- ANOVA_DCt(data_3factor,
numberOfrefGenes = 1,
block = NULL)
data <- res$Results
plotThreeFactor(data,
3, # x-axis factor
5, # bar height
1, # fill groups
2, # facet grid
11, # lower SE column
12, # upper SE column
letters_col = 13,
show.groupingLetters = TRUE)
```
# 6. Post-Hoc and Model Interpretation
## `Means_DDCt`
This function facilitates post-hoc analysis by taking a fitted model object (produced by `ANOVA_DDCt` or `REPEATED_DDCt`) and calculating fold change (FC) values for specific effects.
You specify the effects of interest using the `specs` argument. This allows for the calculation of simple effects, interactions, and slicing, provided an ANOVA model was used. Note that ANCOVA models returned by this package only include simple effects in the `Means_DDCt` output.
```{r eval= T}
# Returning fold change values of Conc levels sliced by Type
# Returning fold change values from a fitted model.
# Firstly, result of `qpcrANOVAFC` or `qpcrREPEATED` is
# acquired which includes a model object:
# Assume 'res' is the result from ANOVA_DDCt
res <- ANOVA_DDCt(data_3factor, numberOfrefGenes = 1, mainFactor.column = 1, block = NULL)
Means_DDCt(res$lm_ANOVA, specs = "Conc * Type")
res2 <- Means_DDCt(res$lm_ANOVA, specs = "Conc | Type")
# Returning fold change values of Conc levels sliced by Type*SA interaction
Means_DDCt(res$lm_ANOVA, specs = "Conc | (Type*SA)")
```
# Citation
```{r eval= T}
citation("rtpcr")
```
# Contact
Email: gh.mirzaghaderi at uok.ac.ir
# References
Livak, Kenneth J, and Thomas D Schmittgen. 2001. Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the Double Delta CT Method. Methods 25 (4). doi.org/10.1006/meth.2001.1262.
Ganger, MT, Dietz GD, Ewing SJ. 2017. A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments. BMC bioinformatics 18, 1-11. doi.org/10.1186/s12859-017-1949-5.
Mirzaghaderi G. 2025. rtpcr: A package for statistical analysis and graphical presentation of qPCR data in R. PeerJ 13, e20185. doi.org/10.7717/peerj.20185.
Pfaffl MW, Horgan GW, Dempfle L. 2002. Relative expression software tool (REST©) for group-wise comparison and statistical analysis of relative expression results in real-time PCR. Nucleic acids research 30, e36-e36. doi.org/10.1093/nar/30.9.e36.
Taylor SC, Nadeau K, Abbasi M, Lachance C, Nguyen M, Fenrich, J. 2019. The ultimate qPCR experiment: producing publication quality, reproducible data the first time. Trends in Biotechnology, 37(7), 761-774doi.org/10.1016/j.tibtech.2018.12.002.
Yuan, JS, Ann Reed, Feng Chen, and Neal Stewart. 2006. Statistical Analysis of Real-Time PCR Data. BMC Bioinformatics 7 (85). doi.org/10.1186/1471-2105-7-85.