The package provides functionalities to tidy a summarised result to obtain a dataframe with which is easier to do subsequent calculations.
In this line, the split
functions, described in
split and unite functions allow to interact with
name-level columns.
For the estimates, we have the pivotEstimates
function,
and for the settings addSettings
. Finally the
tidy
method accommodates the split and pivot
functionalities in the same function.
First, let’s load relevant libraries and create a mock summarised result table.
library(visOmopResults)
library(dplyr)
result <- mockSummarisedResult()
result |> glimpse()
#> Rows: 126
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "9337847", "4006478", "2868369", "7818476", "9065176"…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
The function pivotEstimates
adds columns containing the
estimates values for each combination of columns in
pivotEstimatesBy
. For instance, in the following example we
use the columns variable_name, variable_level, and
estimate_name to pivot the estimates.
result |>
pivotEstimates(pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")) |>
glimpse()
#> Rows: 18
#> Columns: 15
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mo…
#> $ group_name <chr> "cohort_name", "cohort_name", "coho…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "c…
#> $ strata_name <chr> "overall", "age_group &&& sex", "ag…
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&…
#> $ additional_name <chr> "overall", "overall", "overall", "o…
#> $ additional_level <chr> "overall", "overall", "overall", "o…
#> $ `number subjects_NA_count` <int> 9337847, 4006478, 2868369, 7818476,…
#> $ age_NA_mean <dbl> 30.49621, 27.51317, 19.64153, 84.40…
#> $ age_NA_sd <dbl> 3.3287556, 4.6797953, 3.8420378, 7.…
#> $ Medications_Amoxiciline_count <int> 21944, 70846, 27309, 44353, 34557, …
#> $ Medications_Amoxiciline_percentage <dbl> 12.759029, 81.434293, 99.356778, 49…
#> $ Medications_Ibuprofen_count <int> 2795, 1362, 94596, 12537, 66965, 25…
#> $ Medications_Ibuprofen_percentage <dbl> 30.713166, 8.628628, 59.166925, 83.…
The argument nameStyle
is to customise the names of the
new columns. It uses the glue package syntax. For instance:
result |>
pivotEstimates(pivotEstimatesBy = "estimate_name",
nameStyle = "{toupper(estimate_name)}") |>
glimpse()
#> Rows: 72
#> Columns: 14
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ COUNT <int> 9337847, 4006478, 2868369, 7818476, 9065176, 2211710,…
#> $ MEAN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ SD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ PERCENTAGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
The function addSettings
adds a new column for each of
the settings in the summarised result, if any:
mockSummarisedResult() |>
addSettings() |>
glimpse()
#> Rows: 126
#> Columns: 16
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "2703410", "3101646", "4285343", "2451643", "6496595"…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0",…
Finally, the method tidy
incorporates the splitting pf
name-level columns and pivotting of estimates and settings. By default,
it splits group, strata and additional, pivots estimates by the columns
“estimate_name” and also pivots the settings.
result <- mockSummarisedResult()
result |>
tidy() |>
glimpse()
#> Rows: 72
#> Columns: 14
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock"…
#> $ cohort_name <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1",…
#> $ age_group <chr> "overall", "<40", ">=40", "<40", ">=40", "overall", "o…
#> $ sex <chr> "overall", "Male", "Male", "Female", "Female", "Male",…
#> $ variable_name <chr> "number subjects", "number subjects", "number subjects…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ count <int> 3397666, 5378334, 1665180, 7493291, 1764428, 6818035, …
#> $ mean <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ percentage <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "m…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults", …
#> $ package_version <chr> "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", …
Which column pairs to split can be customised with the split
arguments, while pivotEstimatesBy
and
nameStyle
are for pivotting estimates. If
pivotEstimatesBy
is NULL
or
character()
, estimates will not be modified. Settings will
always be pivotted if present.