REDCap is an electronic data capture software that is widely used in the academic research community. The REDCapR package streamlines calls to the REDCap API from an R environment. One of REDCapR’s main uses is to import records from a REDCap project. This works well for simple projects, however becomes ugly when complex databases that include longitudinal structure and/or repeating instruments are involved.
The REDCapTidieR package aims to make the life of analysts who deal with complex REDCap databases easier. It builds upon REDCapR to make its output tidier. Instead of one large data frame that contains all the data from your project, you get to work with a set of tidy tibbles, one for each REDCap instrument.
Let’s look at a REDCap project that has information about some 734 superheroes, derived from the Superhero Database.
Here is a screenshot of the REDCap Record Status Dashboard of this database. It has two instruments, Heroes Information which captures “demographic” data about each individual superhero such as their name, gender, and alignment (good or evil), and Super Hero Powers which captures each one of the superpowers that a specific superhero possesses.
To import data from REDCap, use the
read_redcap()
function. read_redcap()
requires
a REDCap database URI and a REDCap
API token. You need to have API access to the REDCap database to
use REDCapTidieR. REDCapTidieR does not work with files exported from
REDCap. We use it here to import data from the Superheroes
database. You can see that it returns a tibble named superheroes
.
We use rmarkdown::paged_table()
so you can explore this
tibble.
library(REDCapTidieR)
superheroes <- read_redcap(redcap_uri, token)
superheroes |>
rmarkdown::paged_table()
You can see that the tibble that read_redcap()
returned
has only two rows. This may be
surprising because you might expect more rows from a database with 734
superheroes. read_redcap()
returns data in a special object
that we call the supertibble. The
supertibble contains, among other things, tibbles with the data and
metadata derived from each instrument. We call these the data tibbles and
metadata
tibbles.
Each row of the supertibble corresponds to one REDCap
instrument. The redcap_form_name
and
redcap_form_label
columns identify which instrument the
row relates to. The redcap_data
column contains the data
tibbles. The redcap_metadata
column contains the metadata
tibbles. Additional columns contain useful information about the data
tibble, such as row and column counts, size in memory, and the
percentage of missing values in the data.
We designed the supertibble so you can explore it with the RStudio Data Viewer. You can click on the table icon in the Environment tab to view of the supertibble in the data viewer. At a glance you see an overview of the instruments in the REDCap project.
You can drill down into individual tables in the
redcap_data
and redcap_metadata
columns. Note
that in the heroes_information
data tibble, each row
represents a superhero, identified by their record_id
.