Getting Started with REDCapTidieR

REDCap is an electronic data capture software that is widely used in the academic research community. The REDCapR package streamlines calls to the REDCap API from an R environment. One of REDCapR’s main uses is to import records from a REDCap project. This works well for simple projects, however becomes ugly when complex databases that include longitudinal structure and/or repeating instruments are involved.

The REDCapTidieR package aims to make the life of analysts who deal with complex REDCap databases easier. It builds upon REDCapR to make its output tidier. Instead of one large data frame that contains all the data from your project, you get to work with a set of tidy tibbles, one for each REDCap instrument.

Case Study: The Superhero Database

Let’s look at a REDCap project that has information about some 734 superheroes, derived from the Superhero Database.

Here is a screenshot of the REDCap Record Status Dashboard of this database. It has two instruments, Heroes Information which captures “demographic” data about each individual superhero such as their name, gender, and alignment (good or evil), and Super Hero Powers which captures each one of the superpowers that a specific superhero possesses.

REDCap Record Status Dashboard for the Superhero database
REDCap Record Status Dashboard for the Superhero database

Importing data from REDCap

To import data from REDCap, use the read_redcap() function. read_redcap() requires a REDCap database URI and a REDCap API token. You need to have API access to the REDCap database to use REDCapTidieR. REDCapTidieR does not work with files exported from REDCap. We use it here to import data from the Superheroes database. You can see that it returns a tibble named superheroes. We use rmarkdown::paged_table() so you can explore this tibble.

library(REDCapTidieR)
superheroes <- read_redcap(redcap_uri, token)

superheroes |>
  rmarkdown::paged_table()

You can see that the tibble that read_redcap() returned has only two rows. This may be surprising because you might expect more rows from a database with 734 superheroes. read_redcap() returns data in a special object that we call the supertibble. The supertibble contains, among other things, tibbles with the data and metadata derived from each instrument. We call these the data tibbles and metadata tibbles.

Each row of the supertibble corresponds to one REDCap instrument. The redcap_form_name and redcap_form_label columns identify which instrument the row relates to. The redcap_data column contains the data tibbles. The redcap_metadata column contains the metadata tibbles. Additional columns contain useful information about the data tibble, such as row and column counts, size in memory, and the percentage of missing values in the data.

Exploring the contents of the supertibble

We designed the supertibble so you can explore it with the RStudio Data Viewer. You can click on the table icon in the Environment tab to view of the supertibble in the data viewer. At a glance you see an overview of the instruments in the REDCap project.

Data Viewer showing the superheroes supertibble
Data Viewer showing the superheroes supertibble

You can drill down into individual tables in the redcap_data and redcap_metadata columns. Note that in the heroes_information data tibble, each row represents a superhero, identified by their record_id.