The evaluation of convergence is important not only for
determining the dynamic of member states in the EU but also as a support
to policy makers.
The R package convergEU is a suite of functions to download, clean and analyze some convergence features.
In this document, the package convergEU is described and the main functionalities illustrated.
Two types of sources are considered: data produced by Eurofound, available without and active Internet connection, and Eurostat data that can be downloaded on the fly, upon necessity from this package.
Some datasets are accessible from package convergEU using the R function data(), for example :
Eurofound datasets are locally available within the convergEU package, see:
A description of the above data is available by the R help, for example:
Eurofond local data are considered below:
data(dbEurofound)
head(dbEurofound)
#> # A tibble: 6 × 17
#> time geo geo_label sex lifesatisf health goodhealth_p trustlocal volunt
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1960 AD Andorra Females NA NA NA NA NA
#> 2 1960 AD Andorra Males NA NA NA NA NA
#> 3 1960 AD Andorra Total NA NA NA NA NA
#> 4 1960 AL Albania Females NA NA NA NA NA
#> 5 1960 AL Albania Males NA NA NA NA NA
#> 6 1960 AL Albania Total NA NA NA NA NA
#> # ℹ 8 more variables: volunt_p <dbl>, caring_h <dbl>, socialexc_i <dbl>,
#> # JQIskill_i <dbl>, JQIenviron_i <dbl>, JQIintensity_i <dbl>,
#> # JQItime_i <dbl>, exposdiscr_p <dbl>
where variable names are:
names(dbEurofound)
#> [1] "time" "geo" "geo_label" "sex"
#> [5] "lifesatisf" "health" "goodhealth_p" "trustlocal"
#> [9] "volunt" "volunt_p" "caring_h" "socialexc_i"
#> [13] "JQIskill_i" "JQIenviron_i" "JQIintensity_i" "JQItime_i"
#> [17] "exposdiscr_p"
and time ranges in the interval:
and the dataset is not complete in such a time range for all considered countries.
Further details on Eurofound dataset are available as follows (metainformation):
data(dbEUF2018meta)
print(dbEUF2018meta,n=20,width=100)
#> # A tibble: 13 × 10
#> DIMENSION SUBDIMENSION INDICATOR Code_in_database Official_code
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Quality of life Life satisfaction Mean lif… lifesatisf y16_q4
#> 2 Quality of life Health Mean hea… health y16_q48
#> 3 Quality of life Health Percenta… goodhealth_p y16_q48
#> 4 Quality of life Quality of socie… Mean lev… trustlocal y16_q35f
#> 5 Quality of life Quality of socie… Level of… volunt y16_q29a
#> 6 Quality of life Quality of socie… Percenta… volunt_p y16_q29a
#> 7 Quality of life Quality of socie… Hours pe… caring_h y16_q43a
#> 8 Quality of life Quality of socie… Social E… socialexc_i y16_socexind…
#> 9 Working conditions Working conditio… JQI_Skil… JQIskill_i wq_slim - J…
#> 10 Working conditions Working conditio… JQI_Phys… JQIenviron_i envsec_slim …
#> 11 Working conditions Working conditio… JQI_Inte… JQIintensity_i intens_slim …
#> 12 Working conditions Working conditio… JQI_Work… JQItime_i wlb_slim - J…
#> 13 Working conditions Working conditio… Expositi… exposdiscr_p disc_d - Ha…
#> Unit Source_organisation Source_reference Disaggregation Bookmark_URL
#> <chr> <chr> <chr> <chr> <chr>
#> 1 -- Eurofound EQLS sex https://www.eurofo…
#> 2 -- Eurofound EQLS sex https://www.eurofo…
#> 3 % Eurofound EQLS sex https://www.eurofo…
#> 4 -- Eurofound EQLS sex https://www.eurofo…
#> 5 -- Eurofound EQLS sex https://www.eurofo…
#> 6 % Eurofound EQLS sex https://www.eurofo…
#> 7 hours Eurofound EQLS sex https://www.eurofo…
#> 8 index Eurofound EQLS sex https://www.eurofo…
#> 9 index Eurofound EWCS sex https://www.eurofo…
#> 10 index Eurofound EWCS sex https://www.eurofo…
#> 11 index Eurofound EWCS sex https://www.eurofo…
#> 12 index Eurofound EWCS sex https://www.eurofo…
#> 13 % Eurofound EWCS sex https://www.eurofo…
NOTE: within convergeEU package, Eurofound data are statically stored. Please update this package to have the most recent version of Eurofound data.
The first step of an analysis is data preparation. This amounts to choose a time interval, an indicator and a set of countries (MS, Member States), for example:
convergEU_glb()$EU12$memberStates$codeMS
#> [1] "BE" "DK" "FR" "DE" "EL" "IE" "IT" "LU" "NL" "PT" "ES" "UK"
thus, selecting “lifesatisf” from the column “Code_in_database”
myTB <- extract_indicator_EUF(
indicator_code = "lifesatisf", #Code_in_database
fromTime=2003,
toTime=2016,
gender= c("Total","Females","Males")[2],
countries= convergEU_glb()$EU12$memberStates$codeMS
)
myTB
#> $res
#> # A tibble: 4 × 14
#> time sex BE DE DK EL ES FR IE IT LU NL PT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 Femal… 7.38 7.48 8.37 6.75 7.43 6.97 7.89 7.12 7.77 7.54 5.86
#> 2 2007 Femal… 7.49 7.17 8.51 6.55 7.14 7.34 7.66 6.54 7.87 7.90 6.09
#> 3 2011 Femal… 7.47 7.27 8.30 6.13 7.51 7.17 7.41 6.86 7.72 7.74 6.72
#> 4 2016 Femal… 7.27 7.28 8.33 5.30 6.97 7.24 7.66 6.56 7.96 7.74 6.79
#> # ℹ 1 more variable: UK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
which results in a complete dataset ready for further analysis. IMPORTANT: the analysis of convergence is performed on clean and imputed data, i.e. a tidy dataset in the format years by countries. This means that the dataset must always have these characteristics:
If missing values are present, then imputation is required, as described in the next sections.
Another illustrative example follows.
print(dbEUF2018meta,n=20,width=100)
#> # A tibble: 13 × 10
#> DIMENSION SUBDIMENSION INDICATOR Code_in_database Official_code
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Quality of life Life satisfaction Mean lif… lifesatisf y16_q4
#> 2 Quality of life Health Mean hea… health y16_q48
#> 3 Quality of life Health Percenta… goodhealth_p y16_q48
#> 4 Quality of life Quality of socie… Mean lev… trustlocal y16_q35f
#> 5 Quality of life Quality of socie… Level of… volunt y16_q29a
#> 6 Quality of life Quality of socie… Percenta… volunt_p y16_q29a
#> 7 Quality of life Quality of socie… Hours pe… caring_h y16_q43a
#> 8 Quality of life Quality of socie… Social E… socialexc_i y16_socexind…
#> 9 Working conditions Working conditio… JQI_Skil… JQIskill_i wq_slim - J…
#> 10 Working conditions Working conditio… JQI_Phys… JQIenviron_i envsec_slim …
#> 11 Working conditions Working conditio… JQI_Inte… JQIintensity_i intens_slim …
#> 12 Working conditions Working conditio… JQI_Work… JQItime_i wlb_slim - J…
#> 13 Working conditions Working conditio… Expositi… exposdiscr_p disc_d - Ha…
#> Unit Source_organisation Source_reference Disaggregation Bookmark_URL
#> <chr> <chr> <chr> <chr> <chr>
#> 1 -- Eurofound EQLS sex https://www.eurofo…
#> 2 -- Eurofound EQLS sex https://www.eurofo…
#> 3 % Eurofound EQLS sex https://www.eurofo…
#> 4 -- Eurofound EQLS sex https://www.eurofo…
#> 5 -- Eurofound EQLS sex https://www.eurofo…
#> 6 % Eurofound EQLS sex https://www.eurofo…
#> 7 hours Eurofound EQLS sex https://www.eurofo…
#> 8 index Eurofound EQLS sex https://www.eurofo…
#> 9 index Eurofound EWCS sex https://www.eurofo…
#> 10 index Eurofound EWCS sex https://www.eurofo…
#> 11 index Eurofound EWCS sex https://www.eurofo…
#> 12 index Eurofound EWCS sex https://www.eurofo…
#> 13 % Eurofound EWCS sex https://www.eurofo…
names(convergEU_glb())
#> [1] "EUcodes" "EA" "EA19" "EU12"
#> [5] "EU15" "EU25" "EU27_2007" "EU27_2019"
#> [9] "EU27_2020" "EU27" "EU28" "geoRefEUF"
#> [13] "metaEUStat" "tmpl_out" "paralintags" "rounDigits"
#> [17] "epsilonV" "scoreBoaTB" "labels_clusters"
myTB <- extract_indicator_EUF(
indicator_code = "JQIintensity_i", #Code_in_database
fromTime= 1965,
toTime=2016,
gender= c("Total","Females","Males")[1],
countries= convergEU_glb()$EU27_2020$memberStates$codeMS
)
print(myTB$res,n=35,width=250)
#> # A tibble: 5 × 29
#> time sex AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1995 Total 48.8 33.2 NA NA NA 40.8 39.0 NA 40.9 34.2 47.1
#> 2 2000 Total 42.9 37.3 43.7 53.4 42.6 40.9 37.6 38.2 43.5 36.2 46.7
#> 3 2005 Total 47.6 42.8 33.8 50.7 45.8 46.9 47.9 41.7 50.5 41.2 49.6
#> 4 2010 Total 42.1 40.2 31.2 52.5 41.9 44.9 39.1 41.9 48.6 38.0 45.9
#> 5 2015 Total 42.4 41.5 34.6 57.2 36.7 40.2 45.0 38.7 49.3 46.5 41.1
#> FR HR HU IE IT LT LU LV MT NL PL PT RO
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 38.4 NA NA 39.0 34.1 NA 31.4 NA NA 41.8 NA 36.2 NA
#> 2 39.5 NA 42.1 42.2 39.7 29.9 37.6 32.8 50.0 41.3 40.9 31.8 47.8
#> 3 40.5 31.6 47.2 36.9 41.9 37.3 40.6 34.3 48.4 40.3 35.7 40.1 45.9
#> 4 43.0 39.5 48.7 47.0 40.8 33.2 40.8 31.9 44.0 38.5 31.4 31.6 43.3
#> 5 42.7 38.4 44.7 42.8 38.1 37.8 42.4 31.5 44.8 38.7 35.0 36.8 54.2
#> SE SI SK
#> <dbl> <dbl> <dbl>
#> 1 43.3 NA NA
#> 2 47.9 29.5 41.6
#> 3 48.1 49.2 39.6
#> 4 45.9 48.2 37.6
#> 5 46.1 43.0 35.9
Imputation must take place before doing any analysis:
myTBinp <- impute_dataset(myTB$res, timeName = "time",
countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[2])
print(myTBinp$res,n=35,width=250)
#> # A tibble: 5 × 29
#> time sex AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1995 Total 48.8 33.2 43.7 53.4 42.6 40.8 39.0 38.2 40.9 34.2 47.1
#> 2 2000 Total 42.9 37.3 43.7 53.4 42.6 40.9 37.6 38.2 43.5 36.2 46.7
#> 3 2005 Total 47.6 42.8 33.8 50.7 45.8 46.9 47.9 41.7 50.5 41.2 49.6
#> 4 2010 Total 42.1 40.2 31.2 52.5 41.9 44.9 39.1 41.9 48.6 38.0 45.9
#> 5 2015 Total 42.4 41.5 34.6 57.2 36.7 40.2 45.0 38.7 49.3 46.5 41.1
#> FR HR HU IE IT LT LU LV MT NL PL PT RO
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 38.4 31.6 42.1 39.0 34.1 29.9 31.4 32.8 50.0 41.8 40.9 36.2 47.8
#> 2 39.5 31.6 42.1 42.2 39.7 29.9 37.6 32.8 50.0 41.3 40.9 31.8 47.8
#> 3 40.5 31.6 47.2 36.9 41.9 37.3 40.6 34.3 48.4 40.3 35.7 40.1 45.9
#> 4 43.0 39.5 48.7 47.0 40.8 33.2 40.8 31.9 44.0 38.5 31.4 31.6 43.3
#> 5 42.7 38.4 44.7 42.8 38.1 37.8 42.4 31.5 44.8 38.7 35.0 36.8 54.2
#> SE SI SK
#> <dbl> <dbl> <dbl>
#> 1 43.3 29.5 41.6
#> 2 47.9 29.5 41.6
#> 3 48.1 49.2 39.6
#> 4 45.9 48.2 37.6
#> 5 46.1 43.0 35.9
Several functions in convergEU package return a list with metainformation, that is three components: res, msg, err. The first list component, res, is the actual result, if computed. The second component, msg is a message decorating the computed result, possibly a warning. The third component, err, is an error message or a list of errors when a result is not computed. Below this behavior is illustrated for function check_data.
The structure of the standard dataset is a time by countries rectangular table. All variables are quantitative. The following function check for such features:
where the list component res is TRUE, that is all checks are passed.
In case of qualitative variable or missing data checks fail, for example if time is qualitative:
tmp <- emp_20_64_MS
tmp <- mutate(tmp, time=factor(emp_20_64_MS$time))
check_data(tmp)
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: qualitative variables in the dataframe."
the err component explains what went wrong.
Similar errors are signaled if the dataset is not complete:
Let’s consider the following indicator from the Eurofound database:
myTB <- extract_indicator_EUF(
indicator_code = "exposdiscr_p", #Code_in_database
fromTime=1966,
toTime=2016,
gender= c("Total","Females","Males")[1],
countries= convergEU_glb()$EU12$memberStates$codeMS
)
where missing value are absent
sapply(myTB$res,function(vx)sum(is.na(vx)))
#> time sex BE DE DK EL ES FR IE IT LU NL PT UK
#> 0 0 0 0 0 0 0 0 0 0 0 0 0 0
thus an artificial dataset is built by introducing some missing values and by taking further years for testing purposes:
set.seed(1999)
myTB2 <- dplyr::bind_rows(myTB$res,myTB$res,myTB$res)
myTB2 <- dplyr::mutate(myTB2, time= seq(1975,2015,5))
for(aux in 3:14){
myTB2[[aux]] <- myTB2[[aux]] + c(runif(6,-2.5,2.5),0,0,0)
}
myTB2[["BE"]][1:2] <- NA
myTB2[["DE"]][8:9] <- NA
myTB2[["IT"]][c(3,4, 6,7,8)] <- NA
myTB2[["DK"]][6] <- NA
myTB2
#> # A tibble: 9 × 14
#> time sex BE DE DK EL ES FR IE IT LU NL PT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1975 Total NA 6.68 4.62 9.77 0.134 6.25 8.94 4.20 10.8 8.56 2.97
#> 2 1980 Total NA 8.71 3.97 10.6 6.90 11.6 3.16 5.23 11.2 5.43 5.11
#> 3 1985 Total 9.57 6.35 3.49 7.86 4.67 10.8 6.37 NA 11.4 12.0 5.74
#> 4 1990 Total 5.68 6.29 4.61 9.55 3.97 6.09 6.99 NA 8.09 9.46 3.88
#> 5 1995 Total 13.0 8.74 3.06 9.36 3.97 11.9 5.60 2.68 10.7 5.04 1.56
#> 6 2000 Total 9.75 7.63 NA 6.51 5.85 11.4 8.39 NA 15.6 11.7 5.61
#> 7 2005 Total 6.14 4.53 5.66 7.89 2.13 5.08 6.75 NA 8.86 8.51 4.97
#> 8 2010 Total 11.0 NA 4.93 8.32 4.47 10.6 5.32 NA 10.9 6.02 3.88
#> 9 2015 Total 9.65 NA 5.40 7.88 4.88 11.2 6.83 6.75 13.6 12.2 3.61
#> # ℹ 1 more variable: UK <dbl>
Now an imputation function may be called to prepare data for calculations on convergence. The two examples below differ about what to do with missing starting values.
toBeProcessed <- c( "IT","BE", "DE", "DK","UK")
# debug(impute_dataset)
impute_dataset(myTB2, countries=toBeProcessed,
timeName = "time",
tailMiss = c("cut", "constant")[1],
headMiss = c("cut", "constant")[1])
#> $res
#> # A tibble: 5 × 14
#> time sex BE DE DK EL ES FR IE IT LU NL PT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1985 Total 9.57 6.35 3.49 7.86 4.67 10.8 6.37 4.38 11.4 12.0 5.74
#> 2 1990 Total 5.68 6.29 4.61 9.55 3.97 6.09 6.99 3.53 8.09 9.46 3.88
#> 3 1995 Total 13.0 8.74 3.06 9.36 3.97 11.9 5.60 2.68 10.7 5.04 1.56
#> 4 2000 Total 9.75 7.63 4.36 6.51 5.85 11.4 8.39 3.70 15.6 11.7 5.61
#> 5 2005 Total 6.14 4.53 5.66 7.89 2.13 5.08 6.75 4.71 8.86 8.51 4.97
#> # ℹ 1 more variable: UK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
impute_dataset(myTB2, countries=toBeProcessed,
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[1])
#> $res
#> # A tibble: 7 × 14
#> time sex BE DE DK EL ES FR IE IT LU NL PT
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1975 Total 9.57 6.68 4.62 9.77 0.134 6.25 8.94 4.20 10.8 8.56 2.97
#> 2 1980 Total 9.57 8.71 3.97 10.6 6.90 11.6 3.16 5.23 11.2 5.43 5.11
#> 3 1985 Total 9.57 6.35 3.49 7.86 4.67 10.8 6.37 4.38 11.4 12.0 5.74
#> 4 1990 Total 5.68 6.29 4.61 9.55 3.97 6.09 6.99 3.53 8.09 9.46 3.88
#> 5 1995 Total 13.0 8.74 3.06 9.36 3.97 11.9 5.60 2.68 10.7 5.04 1.56
#> 6 2000 Total 9.75 7.63 4.36 6.51 5.85 11.4 8.39 3.70 15.6 11.7 5.61
#> 7 2005 Total 6.14 4.53 5.66 7.89 2.13 5.08 6.75 4.71 8.86 8.51 4.97
#> # ℹ 1 more variable: UK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
The above calculations passed numerical tests and comparisons. If a country is processed but it has no missing, then no numerical value change.
Several measures of convergence have been recently proposed by Eurofound (Eurofound (2018), Upward convergence in the EU: Concepts, measurements and indicators, Publications Office of the European Union, Luxembourg; by: Massimiliano Mascherini, Martina Bisello, Hans Dubois and Franz Eiffe)
In this section each each measure is considered by one or more examples.
Let’s assume we have a dataset (tibble) of sorted times by countries values. The calculations are performed according to the following linear model: ln(ym,i,t+τ)−ln(ym,i,t)=β0+β1ln(ym,i,t)+ϵm,i,t where m represent the member state of EU (country), i refers to an indicator of interest, t is the reference time and τ∈{1,2,…} the length of the time window (typically 1 or more years).
In the simplest case, just two time values are considered, t and t+τ, while in a more general setup all
observed times in set {t,t+1,…,t+τ−1,t+τ} are
included into regression.
In this more general case, the current implementation of
beta-convergence function always maintain the same reference time across
different years and it divides the left hand side by the amount of time
elasped as an option, that is the alternative formula: τ−1(ln(ym,i,t+τ)−ln(ym,i,t))=β0+β1ln(ym,i,t)+ϵm,i,t is available.
The output of beta_conv() is a list in which transformed data, the point estimate of β1 and a standard two tails test is reported (p-value and adjusted R squared). One tail test H0:β1≥0 against H1:β1<0 might be of some interest, but it is not implemented.
Below an example on how to invoke the function:
#library(ggplot2)
#library(dplyr)
#library(tibble)
testTB <- tribble(
~time, ~countryA , ~countryB, ~countryC,
2000, 0.8, 2.7, 3.9,
2001, 1.2, 3.2, 4.2,
2002, 0.9, 2.9, 4.1,
2003, 1.3, 2.9, 4.0,
2004, 1.2, 3.1, 4.1,
2005, 1.2, 3.0, 4.0
)
res <- beta_conv(tavDes = testTB, time_0 = 2002, time_t = 2004,
all_within = TRUE,
timeName = "time")
res
#> $res
#> $res$workTB
#> # A tibble: 6 × 3
#> deltaIndic indic countries
#> <dbl> <dbl> <chr>
#> 1 0.184 -0.105 countryA
#> 2 0 1.06 countryB
#> 3 -0.0123 1.41 countryC
#> 4 0.144 -0.105 countryA
#> 5 0.0333 1.06 countryB
#> 6 0 1.41 countryC
#>
#> $res$model
#>
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#>
#> Coefficients:
#> (Intercept) indic
#> 0.1495 -0.1156
#>
#>
#> $res$summary
#> # A tibble: 2 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.149 0.0134 11.1 0.000371
#> 2 indic -0.116 0.0131 -8.80 0.000920
#>
#> $res$beta1
#> [1] -0.1156032
#>
#> $res$adj.r.squared
#> [1] 0.9386146
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
but note that this is not the common practice, which considers the first and last time instead.
In order to consider just two times, starting and ending times, the option all_within = FALSE must be specified
res <- beta_conv(tavDes = testTB, time_0 = 2002, time_t = 2004,
all_within = FALSE,
timeName = "time")
res
#> $res
#> $res$workTB
#> # A tibble: 3 × 3
#> deltaIndic indic countries
#> <dbl> <dbl> <chr>
#> 1 0.144 -0.105 countryA
#> 2 0.0333 1.06 countryB
#> 3 0 1.41 countryC
#>
#> $res$model
#>
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#>
#> Coefficients:
#> (Intercept) indic
#> 0.13393 -0.09475
#>
#>
#> $res$summary
#> # A tibble: 2 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.134 0.000353 380. 0.00168
#> 2 indic -0.0948 0.000345 -275. 0.00232
#>
#> $res$beta1
#> [1] -0.09475194
#>
#> $res$adj.r.squared
#> [1] 0.9999735
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Note that all_within = FALSE is the default.
The key concept in sigma-convergence is variability with respect to the mean. Let Ym,i,t be the value of indicator i for member state m at time t, and ¯YA,i,t the average over aggregation A, for example A=EU272020, than:
For each year, the above summaries are calculated to quantify if a reduction in heterogeneity took place.
In this section we assume that all member states contributing to the unweighted mean are contained into the dataset, for example:
testTB <- tribble(
~time, ~countryA , ~countryB, ~countryC,
2000, 0.8, 2.7, 3.9,
2001, 1.2, 3.2, 4.2,
2002, 0.9, 2.9, 4.1,
2003, 1.3, 2.9, 4.0,
2004, 1.2, 3.1, 4.1,
2005, 1.2, 3.0, 4.0
)
sigma_conv(testTB,timeName="time")
#> $res
#> # A tibble: 6 × 5
#> time stdDev CV mean devianceT
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 1.28 0.517 2.47 4.89
#> 2 2001 1.25 0.435 2.87 4.67
#> 3 2002 1.32 0.501 2.63 5.23
#> 4 2003 1.11 0.406 2.73 3.69
#> 5 2004 1.20 0.430 2.8 4.34
#> 6 2005 1.16 0.424 2.73 4.03
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
It is possible to select a time window, as follows:
sigma_conv(testTB,timeName="time",time_0 = 2002,time_t = 2004)
#> $res
#> # A tibble: 3 × 5
#> time stdDev CV mean devianceT
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 1.32 0.501 2.63 5.23
#> 2 2003 1.11 0.406 2.73 3.69
#> 3 2004 1.20 0.430 2.8 4.34
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
sigma_conv(testTB,time_0 = 2002,time_t = 2004)
#> $res
#> # A tibble: 3 × 5
#> time stdDev CV mean devianceT
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 1.32 0.501 2.63 5.23
#> 2 2003 1.11 0.406 2.73 3.69
#> 3 2004 1.20 0.430 2.8 4.34
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
More interesting calculations deal with an Eurofound dataset emp_20_64_MS. Note that all and only countries in EU28 are included, those that contribute to the average:
data(emp_20_64_MS)
mySTB <- sigma_conv(emp_20_64_MS)
mySTB
#> $res
#> # A tibble: 17 × 5
#> time stdDev CV mean devianceT
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 6.34 0.0939 67.5 1125.
#> 2 2003 5.95 0.0878 67.8 991.
#> 3 2004 5.70 0.0839 67.9 909.
#> 4 2005 5.54 0.0809 68.4 858.
#> 5 2006 5.57 0.0801 69.6 869.
#> 6 2007 5.47 0.0775 70.6 838.
#> 7 2008 5.36 0.0755 71.0 804.
#> 8 2009 5.03 0.0730 69.0 710.
#> 9 2010 5.24 0.0769 68.1 768.
#> 10 2011 5.59 0.0821 68.1 875.
#> 11 2012 5.98 0.0880 68 1002.
#> 12 2013 6.28 0.0922 68.0 1103.
#> 13 2014 5.98 0.0867 69.0 1000.
#> 14 2015 5.74 0.0820 70.0 922.
#> 15 2016 5.60 0.0789 71.0 879.
#> 16 2017 5.37 0.0741 72.5 808.
#> 17 2018 5.30 0.0717 73.8 786.
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
As a first step, the departure from the mean is characterized
res <- departure_mean(oriTB = emp_20_64_MS, sigmaTB = mySTB$res)
names(res$res)
#> [1] "departures" "squaredContrib" "devianceContrib"
res$res$departures
#> # A tibble: 17 × 33
#> time stdDev CV mean devianceT AT BE BG CY CZ DE DK
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 6.34 0.0939 67.5 1125. 0 0 -1 1 0 0 1
#> 2 2003 5.95 0.0878 67.8 991. 0 0 -1 1 0 0 1
#> 3 2004 5.70 0.0839 67.9 909. 0 0 -1 1 0 0 1
#> 4 2005 5.54 0.0809 68.4 858. 0 0 -1 1 0 0 1
#> 5 2006 5.57 0.0801 69.6 869. 0 0 0 1 0 0 1
#> 6 2007 5.47 0.0775 70.6 838. 0 0 0 1 0 0 1
#> 7 2008 5.36 0.0755 71.0 804. 0 0 0 1 0 0 1
#> 8 2009 5.03 0.0730 69.0 710. 0 0 0 1 0 1 1
#> 9 2010 5.24 0.0769 68.1 768. 1 0 0 1 0 1 1
#> 10 2011 5.59 0.0821 68.1 875. 1 0 0 0 0 1 1
#> 11 2012 5.98 0.0880 68 1002. 1 0 0 0 0 1 1
#> 12 2013 6.28 0.0922 68.0 1103. 1 0 0 0 0 1 0
#> 13 2014 5.98 0.0867 69.0 1000. 0 0 0 0 0 1 0
#> 14 2015 5.74 0.0820 70.0 922. 0 0 0 0 0 1 0
#> 15 2016 5.60 0.0789 71.0 879. 0 0 0 0 1 1 0
#> 16 2017 5.37 0.0741 72.5 808. 0 0 0 0 1 1 0
#> 17 2018 5.30 0.0717 73.8 786. 0 0 0 0 1 1 0
#> # ℹ 21 more variables: EE <dbl>, EL <dbl>, ES <dbl>, FI <dbl>, FR <dbl>,
#> # HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> # MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>,
#> # SK <dbl>, UK <dbl>
where −1,0,1 indicates values respectively below −1, within the interval (−1,1) and above +1. Details on the contribution of each MS to the variance at a given time t is evaluate by the square of the difference (Ym,i,t−¯YEU27,i,t)2 between the indicator i of country m at time t and the unweighted average over member states, say EU27:
res$res$squaredContrib
#> # A tibble: 17 × 28
#> AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 11.4 7.94 121. 57.5 17.5 1.64e+0 116. 0.612 22.3 18.6 32.3
#> 2 12.5 10.7 82.3 58.2 10.4 3.95e-1 92.7 1.77 15.8 12.1 26.3
#> 3 0.243 4.44 45.0 60.7 4.81 5.10e-5 104. 5.26 13.0 7.33 21.1
#> 4 3.91 3.69 42.5 35.7 5.19 9.58e-1 91.7 12.8 16.2 0.849 21.0
#> 5 4.16 9.37 19.9 38.9 2.69 2.37e+0 96.8 40.2 15.7 0.314 18.8
#> 6 4.84 8.41 4.84 38.4 1.96 5.29e+0 70.6 39.7 23.0 0.810 17.6
#> 7 7.98 8.85 0.0756 30.5 2.03 9.15e+0 59.7 37.5 21.9 6.13 23.3
#> 8 19.3 3.62 0.0414 39.6 3.60 2.70e+1 50.4 0.993 11.6 25.0 20.2
#> 9 33.5 0.264 11.7 47.4 5.22 4.74e+1 46.0 1.73 18.6 28.2 23.9
#> 10 37.7 0.579 26.6 28.5 8.06 7.12e+1 45.4 6.45 71.6 36.7 32.9
#> 11 41.0 0.640 25 4.84 12.2 7.92e+1 39.7 17.6 169 70.6 36
#> 12 43.0 0.710 20.6 0.710 19.9 8.57e+1 39.2 27.6 229. 89.2 27.6
#> 13 27.3 2.81 15.0 1.89 20.5 7.61e+1 32.8 28.4 246. 82.4 17.0
#> 14 18.6 7.74 8.31 4.34 23.2 6.43e+1 29.4 42.5 227. 63.7 8.51
#> 15 14.4 11.0 11.0 5.34 32.4 5.76e+1 24.9 31.2 219. 50.6 5.71
#> 16 8.37 16.1 1.46 2.91 35.9 4.48e+1 16.8 38.4 216. 49.1 2.87
#> 17 5.52 17.2 2.10 0.00250 36.6 3.66e+1 13.3 31.9 206. 46.9 6.00
#> # ℹ 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> # LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> # RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
It is also possible to decompose the numerator of the variance, called deviance, at each time in order to appreciate the percentage of contribution provided by each member state to the total deviance, 100⋅(Ym,i,t−¯YEU27,i,t)2∑m(Ym,i,t−¯YEU27,i,t)2 for the indicator i of country m at time t.
## sigma_conv(testTB,timeName="time",time_0 = 2002,time_t = 2004)
res$res$devianceContrib
#> # A tibble: 17 × 28
#> AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.02 0.706 10.8 5.11 1.56 1.46e-1 10.3 0.0544 1.98 1.66 2.87
#> 2 1.26 1.08 8.30 5.87 1.05 3.99e-2 9.35 0.178 1.59 1.22 2.65
#> 3 0.0267 0.488 4.95 6.68 0.529 5.61e-6 11.4 0.578 1.43 0.806 2.32
#> 4 0.456 0.430 4.95 4.16 0.605 1.12e-1 10.7 1.49 1.88 0.0989 2.44
#> 5 0.479 1.08 2.29 4.48 0.309 2.73e-1 11.1 4.63 1.81 0.0362 2.17
#> 6 0.578 1.00 0.578 4.59 0.234 6.31e-1 8.42 4.74 2.75 0.0967 2.11
#> 7 0.993 1.10 0.00941 3.80 0.253 1.14e+0 7.43 4.67 2.72 0.762 2.90
#> 8 2.72 0.511 0.00584 5.59 0.507 3.81e+0 7.10 0.140 1.63 3.53 2.85
#> 9 4.36 0.0344 1.52 6.17 0.680 6.17e+0 6.00 0.225 2.42 3.68 3.11
#> 10 4.31 0.0662 3.04 3.26 0.922 8.14e+0 5.19 0.737 8.18 4.20 3.77
#> 11 4.09 0.0639 2.50 0.483 1.22 7.91e+0 3.96 1.76 16.9 7.04 3.59
#> 12 3.90 0.0644 1.87 0.0644 1.80 7.77e+0 3.55 2.51 20.8 8.08 2.51
#> 13 2.73 0.280 1.50 0.189 2.05 7.61e+0 3.28 2.83 24.6 8.23 1.70
#> 14 2.02 0.839 0.901 0.470 2.52 6.97e+0 3.18 4.61 24.7 6.91 0.923
#> 15 1.63 1.25 1.25 0.608 3.68 6.55e+0 2.83 3.56 25.0 5.75 0.650
#> 16 1.04 1.99 0.180 0.360 4.44 5.54e+0 2.07 4.74 26.8 6.07 0.354
#> 17 0.703 2.19 0.268 0.000318 4.66 4.66e+0 1.70 4.06 26.2 5.97 0.764
#> # ℹ 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> # LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> # RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
thus each row adds to 100.
It is possible to produce a graphical output about the main features of country time series, as shown below:
myGG <- graph_departure(res$res$departures,
timeName = "time",
displace = 0.25,
displaceh = 0.45,
dimeFontNum = 4,
myfont_scale = 1.35,
x_angle = 45,
color_rect = c("-1"='red1', "0"='gray80',"1"='lightskyblue1'),
axis_name_y = "Countries",
axis_name_x = "Time",
alpha_color = 0.9
)
myGG
#> $res
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Any selection of countries is feasible:
#myWW1<- warnings()
myGG <- graph_departure(res$res$departures[1:10],
timeName = "time",
displace = 0.25,
displaceh = 0.45,
dimeFontNum = 4,
myfont_scale = 1.35,
x_angle = 45,
color_rect = c("-1"='red1', "0"='gray80',"1"='lightskyblue1'),
axis_name_y = "Countries",
axis_name_x = "Time",
alpha_color = 0.29
)
myGG
#> $res
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
We now introduce gamma convergence by an index based on ranks.
Let ym,i,t be the value of indicator i for member state m at time t=0,1,…,T, and {˜ym,i,t:m∈A) the ranks for indicator i over member states in the reference set A, for example A=EU27, at a given time t. The sum of ranks within member state m is: ˜y(s)m,i=T∑t=0˜ym,i,t thus the variance of the sum of ranks over the given interval Var[{˜y(s)m,i:m∈A}] may be compared to the variance of ranks in the reference time t=0: Var[{˜ym,i,0:m∈A}]
The Kendall index KI, with respect to aggregation A of member states for the indicator i over a given time interval is: KI(A,i,T)=Var[{˜y(s)m,i:m∈A}](T+1)2 Var[{˜ym,i,0:m∈A}]
The measure of gamma-convergence is obtained with the following function:
Note the starting time is zero, the reference, but first a copy of the dataset is performed.
(timeCounTB <- testTB)
#> # A tibble: 6 × 4
#> time countryA countryB countryC
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 0.8 2.7 3.9
#> 2 2001 1.2 3.2 4.2
#> 3 2002 0.9 2.9 4.1
#> 4 2003 1.3 2.9 4
#> 5 2004 1.2 3.1 4.1
#> 6 2005 1.2 3 4
Now we move to ranks within time using rank():
therefore with the above data:
# debug(gamma_conv)
(gamma_conv(timeCounTB,ref=2000,last=2005,timeName = "time"))
#> $res
#> [1] 0.7346939
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2004,timeName = "time"))
#> $res
#> [1] 0.6944444
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2003,timeName = "time"))
#> $res
#> [1] 0.64
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2002,timeName = "time"))
#> $res
#> [1] 0.5625
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2001,timeName = "time"))
#> $res
#> [1] 0.4444444
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
and changing reference year:
(gamma_conv(timeCounTB,ref=2001,last=2005,timeName = "time"))
#> $res
#> [1] 0.7346939
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2002,last=2004,timeName = "time"))
#> $res
#> [1] 0.6944444
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Now we exchange values and calculate gamma-convergence:
timeCounTB2 <- timeCounTB
timeCounTB2[2,2:4] <- timeCounTB[2,4:2]
timeCounTB2[4,2:4] <- timeCounTB[4,c(4,2,3)]
timeCounTB2
#> # A tibble: 6 × 4
#> time countryA countryB countryC
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 0.8 2.7 3.9
#> 2 2001 4.2 3.2 1.2
#> 3 2002 0.9 2.9 4.1
#> 4 2003 4 1.3 2.9
#> 5 2004 1.2 3.1 4.1
#> 6 2005 1.2 3 4
gamma_conv(timeCounTB2,last=2005,ref=2000, timeName = "time",printRanks = T)
#> Ranks:
#> countryA countryB countryC
#> [1,] 1 2 3
#> [2,] 3 2 1
#> [3,] 1 2 3
#> [4,] 3 1 2
#> [5,] 1 2 3
#> [6,] 1 2 3
#> $res
#> [1] 0.1428571
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
and after random permutation:
timeCounTB3 <- cbind(timeCounTB[1],t(apply(timeCounTB,1,
function(vet)vet[sample(2:4,3)])))
timeCounTB3
#> time 1 2 3
#> 1 2000 0.8 2.7 3.9
#> 2 2001 1.2 3.2 4.2
#> 3 2002 4.1 2.9 0.9
#> 4 2003 1.3 4.0 2.9
#> 5 2004 4.1 3.1 1.2
#> 6 2005 1.2 4.0 3.0
(gamma_conv(timeCounTB3,last=2005,ref=2000, timeName = "time",printRanks = T))
#> Ranks:
#> 1 2 3
#> [1,] 1 2 3
#> [2,] 1 2 3
#> [3,] 3 2 1
#> [4,] 1 3 2
#> [5,] 3 2 1
#> [6,] 1 3 2
#> $res
#> [1] 0.08163265
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Delta-convergence can be calculated as follows:
Absolute change as described in the reserved Eurofound Annex is defined as: Δym,i,t=ym,i,t−ym,i,t−1 for country m, indicator i at time t.
The R function abso_change calculates the above quantity, for example in the emp_20_64_MS dataset
data(emp_20_64_MS)
mySTB <- abso_change(emp_20_64_MS,
time_0 = 2005,
time_t = 2010,
all_within=TRUE,
timeName = "time")
names(mySTB$res)
#> [1] "abso_change" "sum_abs_change" "average_abs_change"
thus the above equation results in:
mySTB$res$abso_change
#> time AT BE BG CY CZ DE DK EE EL ES FI FR HR HU IE
#> 1 2003 0.4 -0.2 2.2 0.3 -0.7 -0.4 -0.9 0.8 1.0 1.1 -0.3 1.1 0.5 1.0 -0.4
#> 2 2004 -2.9 1.3 2.5 0.3 -0.9 -0.5 0.7 1.1 0.5 0.9 -0.4 -0.5 1.3 -0.4 0.6
#> 3 2005 2.0 0.7 0.7 -1.3 0.6 1.5 -0.1 1.8 0.1 2.3 0.5 0.2 0.3 0.2 1.6
#> 4 2006 1.2 0.0 3.2 1.4 0.5 1.7 1.4 3.9 1.2 1.5 0.9 0.0 0.6 0.4 0.8
#> 5 2007 1.2 1.2 3.3 1.0 0.8 1.8 -0.4 1.0 0.2 0.7 0.9 0.5 3.3 -0.3 1.7
#> IT LT LU LV MT NL PL PT RO SE SI SK UK
#> 1 0.9 2.7 -1.2 1.1 -0.4 -0.5 -0.4 -1.1 0.5 -0.3 -1.9 1.8 0.4
#> 2 1.6 -1.1 0.5 0.0 -0.5 -0.4 -0.3 -0.4 -0.1 -0.7 2.9 -1.5 0.2
#> 3 -0.2 1.1 1.3 1.7 0.1 -2.2 1.3 -0.4 -1.1 0.3 0.1 1.0 0.3
#> 4 0.9 0.6 0.1 4.1 0.5 1.0 1.8 0.4 1.2 0.7 0.4 1.5 0.0
#> 5 0.3 1.4 0.5 2.0 0.7 1.8 2.6 -0.1 -0.4 1.3 0.9 1.2 0.0
The sum of absolute values ∑t=t0+1|Δym,i,t| is:
round(mySTB$res$sum_abs_change,4)
#> AT BE BG CY CZ DE DK EE EL ES FI FR HR HU IE IT
#> 7.7 3.4 11.9 4.3 3.5 5.9 3.5 8.6 3.0 6.5 3.0 2.3 6.0 2.3 5.1 3.9
#> LT LU LV MT NL PL PT RO SE SI SK UK
#> 6.9 3.6 8.9 2.2 5.9 6.4 2.4 3.3 3.3 6.2 7.0 0.9
and such sum can be divided by the number of pair of years so that the result is an average per pair of years:
Here we assume that larger the index, better the performance.
Let’s load the Eurofound indicator lifesatisf:
workDF <- extract_indicator_EUF(
indicator_code ="lifesatisf", #Code_in_database
fromTime=2000,
toTime =2018,
gender= c("Total","Females","Males")[1],
countries = convergEU_glb()$EU27_2020$memberStates$codeMS)
workDF
#> $res
#> # A tibble: 4 × 29
#> time sex AT BE BG CY CZ DE DK EE EL ES FI
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 Total 7.85 7.52 4.47 7.22 6.57 7.36 8.47 5.94 6.69 7.49 8.14
#> 2 2007 Total 6.95 7.54 5.01 7.05 6.59 7.16 8.48 6.72 6.58 7.25 8.23
#> 3 2011 Total 7.66 7.38 5.55 7.16 6.43 7.20 8.37 6.28 6.16 7.47 8.08
#> 4 2016 Total 7.92 7.31 5.62 6.54 6.48 7.31 8.19 6.73 5.26 6.95 8.07
#> # ℹ 16 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> # LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> # RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
wDF <- workDF$res
then we ask if it is complete or some missing values are present:
check_data(select(wDF,-sex),timeName="time")
#> $res
#> NULL
#>
#> $msg
#> NULL
#>
#> $err
#> [1] "Error: one or more missing values in the dataframe."
thus at least one missing value is present. In the next step, imputation of missing values is performed:
wDFI <- impute_dataset(select(wDF,-sex),
countries= names(select(wDF,-sex,-time)),
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[1])
and some checking is done:
which returns TRUE.
First, we calculate the EU unweighted average of emp:
wwTB <- (wDFI$res %>%
average_clust(timeName="time",cluster="EU27"))$res
wwTB$EU27
#> [1] 6.829626 6.984789 7.014334 6.978321
Time series can be plotted:
mini_EU <- min(wwTB$EU27)
maxi_EU <- max(wwTB$EU27)
qplot(time, EU27, data=wwTB,
ylim=c(mini_EU,maxi_EU))+geom_line(colour="navy blue")+
ylab("lifesatisf")
#> Warning: `qplot()` was deprecated in ggplot2 3.4.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
Now the beta-convergence is calculated for just two years:
betaRes <- beta_conv(wDFI$res,time_0=2007, time_t=2011, all_within=FALSE)
betaRes
#> $res
#> $res$workTB
#> # A tibble: 27 × 3
#> deltaIndic indic countries
#> <dbl> <dbl> <chr>
#> 1 0.0244 1.94 AT
#> 2 -0.00553 2.02 BE
#> 3 0.0256 1.61 BG
#> 4 0.00403 1.95 CY
#> 5 -0.00623 1.89 CZ
#> 6 0.00138 1.97 DE
#> 7 -0.00316 2.14 DK
#> 8 -0.0171 1.91 EE
#> 9 -0.0165 1.88 EL
#> 10 0.00719 1.98 ES
#> # ℹ 17 more rows
#>
#> $res$model
#>
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#>
#> Coefficients:
#> (Intercept) indic
#> 0.11155 -0.05679
#>
#>
#> $res$summary
#> # A tibble: 2 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.112 0.0314 3.56 0.00153
#> 2 indic -0.0568 0.0162 -3.51 0.00171
#>
#> $res$beta1
#> [1] -0.05678881
#>
#> $res$adj.r.squared
#> [1] 0.3037316
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
A plot of transformed data and the straight line may be useful:
mybetaplot<-beta_conv_graph(betaRes,
indiName = 'Mean Life Satisfaction',
time_0 = 2007,
time_t = 2011)
mybetaplot
Note that label are replicated as many times as the number of included subsequent years.
Here we go with calculating the sigma-convergence:
It is also possible to obtain a graphical representation of the standard deviation and the coefficient of variation obtained for the Sigma convergence by invoking the sigma_conv_graph function as follows:
Let’s reload Eurofound data:
workDF <- extract_indicator_EUF(
indicator_code ="lifesatisf", #Code_in_database
fromTime=2000,
toTime =2018,
gender= c("Total","Females","Males")[1],
countries = convergEU_glb()$EU27_2020$memberStates$codeMS)
wDFI <- impute_dataset(select(workDF$res,-sex),
countries= names(select(wDF,-sex,-time)),
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[1])
check_data(wDFI$res,timeName="time")
#> $res
#> [1] TRUE
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
Now gamma-convergence is computed:
gamma_conv(wDFI$res,ref=2003,last=2016,timeName = "time")
#> $res
#> [1] 0.5879853
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
or equivalently:
Indeed there is the possibility of performing calculation for each pair of subsequent years in the dataset, that is, each year is the reference of the subsequent year:
wDFI$res
#> # A tibble: 4 × 28
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 7.85 7.52 4.47 7.22 6.57 7.36 8.47 5.94 6.69 7.49 8.14 6.96
#> 2 2007 6.95 7.54 5.01 7.05 6.59 7.16 8.48 6.72 6.58 7.25 8.23 7.32
#> 3 2011 7.66 7.38 5.55 7.16 6.43 7.20 8.37 6.28 6.16 7.47 8.08 7.23
#> 4 2016 7.92 7.31 5.62 6.54 6.48 7.31 8.19 6.73 5.26 6.95 8.07 7.17
#> # ℹ 15 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>
Let ym,i,t be the value of indicator i for member state m at time t, and y(M)i,t the maximum value over member states in the reference set A, for example A=EU27: y(M)i,t=max({ym,i,t:m∈A})
The distance of a member state m from the top performer at time i is: y(M)i,t−ym,i,t thus the overall distance at time t, called delta, is the sum of distances over the reference set A of MS: δi,t=∑m∈A(y(M)i,t−ym,i,t) for the considered indicator i.
The measure of delta-convergence is obtained as follows:
delta_conv(wwTB)
#> $res
#> # A tibble: 4 × 2
#> time delta
#> <dbl> <dbl>
#> 1 2003 46.0
#> 2 2007 41.8
#> 3 2011 38.0
#> 4 2016 34.0
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
It must be noted that the delta_conv function allows to obtain also the declaration of convergence. To this end, the argument extended should be specified as TRUE. For example, for the wwTB indicator the syntax is as follows:
delta_conv(wwTB,"time", extended=TRUE)
#> $res
#> $res$delta_conv
#> # A tibble: 4 × 2
#> time delta
#> <dbl> <dbl>
#> 1 2003 46.0
#> 2 2007 41.8
#> 3 2011 38.0
#> 4 2016 34.0
#>
#> $res$differences
#> AT BE BG CY CZ DE DK EE
#> [1,] 0.6245208 0.9556756 4.003694 1.250645 1.904858 1.113082 0 2.534678
#> [2,] 1.5314507 0.9352727 3.469857 1.430196 1.883980 1.314598 0 1.756762
#> [3,] 0.7128654 0.9938769 2.823287 1.209136 1.939715 1.168355 0 2.095294
#> [4,] 0.2691503 0.8841901 2.568287 1.657201 1.713479 0.885087 0 1.464275
#> EL ES FI FR HR HU IE IT
#> [1,] 1.779027 0.9809632 0.3344421 1.508648 2.035054 2.534398 0.7697177 1.255543
#> [2,] 1.896363 1.2247257 0.2450380 1.154050 2.041460 2.884686 0.8919230 1.889798
#> [3,] 2.211353 0.9066143 0.2946043 1.145797 1.591188 2.598653 0.9794941 1.488251
#> [4,] 2.930435 1.2417102 0.1190214 1.025349 1.860298 1.677889 0.5078416 1.635120
#> LT LU LV MT NL PL PT
#> [1,] 3.028715 0.7834425 2.897604 1.1496062 0.9260612 2.3119373 2.481104
#> [2,] 2.156122 0.5780816 2.438259 0.9191999 0.6090589 1.5864158 2.288289
#> [3,] 1.671255 0.5828137 2.129854 1.1383724 0.6792879 1.3019018 1.604687
#> [4,] 1.709892 0.2899728 1.869513 0.6240749 0.4548769 0.9941764 1.321342
#> RO SE SI SK EU27
#> [1,] 2.348566 0.5827713 1.429040 2.815222 1.642186
#> [2,] 2.003722 0.1419497 1.250872 1.800439 1.493428
#> [3,] 1.638209 0.3387566 1.419971 1.986291 1.357403
#> [4,] 1.692563 0.2608399 1.338965 1.789471 1.214260
#>
#> $res$difference_last_first
#> AT BE BG CY CZ DE
#> -0.35537052 -0.07148552 -1.43540668 0.40655661 -0.19137859 -0.22799492
#> DK EE EL ES FI FR
#> 0.00000000 -1.07040215 1.15140772 0.26074696 -0.21542072 -0.48329926
#> HR HU IE IT LT LU
#> -0.17475605 -0.85650873 -0.26187611 0.37957668 -1.31882334 -0.49346972
#> LV MT NL PL PT RO
#> -1.02809095 -0.52553129 -0.47118425 -1.31776094 -1.15976238 -0.65600348
#> SE SI SK EU27
#> -0.32193136 -0.09007549 -1.02575064 -0.42792575
#>
#> $res$strict_conv_ini_last
#> [1] FALSE
#>
#> $res$label_strict
#> [1] " "
#>
#> $res$converg_ini_last
#> [1] TRUE
#>
#> $res$label_conver
#> [1] "convergence"
#>
#> $res$diffe_delta
#> [1] -11.98192
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
It is also useful to evaluate how much a collection of MS deviates from the EU mean for a given indicator and a period of time. In order to obtain this further information the demea_change function has been implemented in the convergEU package:
res1<-demea_change(wwTB,
timeName="time",
time_0 = 2003,
time_t = 2016,
sele_countries= NA,
doplot=TRUE)
res1
#> $res
#> $res$resDiffe
#> # A tibble: 4 × 29
#> time AT BE BG CY CZ DE DK EE EL ES
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2003 1.02 0.687 -2.36 0.392 -0.263 0.529 1.64 -0.892 -0.137 0.661
#> 2 2007 -0.0380 0.558 -1.98 0.0632 -0.391 0.179 1.49 -0.263 -0.403 0.269
#> 3 2011 0.645 0.364 -1.47 0.148 -0.582 0.189 1.36 -0.738 -0.854 0.451
#> 4 2016 0.945 0.330 -1.35 -0.443 -0.499 0.329 1.21 -0.250 -1.72 -0.0275
#> # ℹ 18 more variables: FI <dbl>, FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>,
#> # IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>,
#> # PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, EU27 <dbl>
#>
#> $res$diffe_abs_diff
#> # A tibble: 3 × 29
#> time AT BE BG CY CZ DE DK EE EL ES
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2007 -0.980 -0.128 -0.385 -0.328 0.128 -0.350 -0.149 -0.629 0.266 -0.393
#> 2 2011 0.607 -0.195 -0.511 0.0850 0.192 0.0102 -0.136 0.475 0.451 0.182
#> 3 2016 0.301 -0.0335 -0.112 0.295 -0.0831 0.140 -0.143 -0.488 0.862 -0.423
#> # ℹ 18 more variables: FI <dbl>, FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>,
#> # IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>,
#> # PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, EU27 <dbl>
#>
#> $res$stats
#> # A tibble: 28 × 4
#> MS negaSum posiSum posi
#> <chr> <dbl> <dbl> <int>
#> 1 AT -0.980 0.907 1
#> 2 BE -0.356 0 2
#> 3 BG -1.01 0 3
#> 4 CY -0.328 0.380 4
#> 5 CZ -0.0831 0.320 5
#> 6 DE -0.350 0.150 6
#> 7 DK -0.428 0 7
#> 8 EE -1.12 0.475 8
#> 9 EL 0 1.58 9
#> 10 ES -0.816 0.182 10
#> # ℹ 18 more rows
#>
#> $res$miniX
#> [1] -1.117033
#>
#> $res$maxiX
#> [1] 1.579333
#>
#> $res$res_graph
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
To plot the calculated differences, the user should invoke the plot function as follows:
There are several auxiliary functions that help to prepare the tidy dataset time by member states (MS, that is countries in EU), which is needed in almost all computations. Here the most important resources are described.
An important summary is obtained
as unweighted average of country values. The cluster of considered
countries may be specified and is also stored within the function
generating global static objects and tables, called
convergEU_glb(). The illustration of this function exploits the
emp_20_64_MS dataframe in convergEU package.
First note that the EU area is made by the following MS:
while labels representing the 28 MS are:
convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#>
#> $memberStates
#> # A tibble: 27 × 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> # ℹ 17 more rows
The list of known MS labels is shown in the appendix.
For example, the unweighted average in the emp_20_64_MS dataset is:
testTB <- emp_20_64_MS
average_clust(testTB,timeName = "time",cluster = "EU27")$res[,c(1,30)]
#> # A tibble: 17 × 2
#> time EU27
#> <dbl> <dbl>
#> 1 2002 67.3
#> 2 2003 67.5
#> 3 2004 67.6
#> 4 2005 68.2
#> 5 2006 69.4
#> 6 2007 70.4
#> 7 2008 70.8
#> 8 2009 68.8
#> 9 2010 67.9
#> 10 2011 67.9
#> 11 2012 67.8
#> 12 2013 67.8
#> 13 2014 68.7
#> 14 2015 69.7
#> 15 2016 70.8
#> 16 2017 72.3
#> 17 2018 73.7
while for EU12 is:
average_clust(testTB,timeName = "time",cluster = "EU12")$res[,c(1,30)]
#> # A tibble: 17 × 2
#> time EU12
#> <dbl> <dbl>
#> 1 2002 69.1
#> 2 2003 69.1
#> 3 2004 69.4
#> 4 2005 69.9
#> 5 2006 70.6
#> 6 2007 71.3
#> 7 2008 71.4
#> 8 2009 69.9
#> 9 2010 69.2
#> 10 2011 68.6
#> 11 2012 68.0
#> 12 2013 67.8
#> 13 2014 68.4
#> 14 2015 69.2
#> 15 2016 70.1
#> 16 2017 71.2
#> 17 2018 72.3
An unknown label, like “EUspirit”, causes computation error:
The basic imputation method is deterministic, like the average of interval endpoints, but it assumes that a linear change of an indicator happened between the two observed time points flanking a chunk of missing values.
intervalTime <- c(1999,2000,2001)
intervalMeasure <- c( 66.5, NA,87.2)
currentData <- tibble(time= intervalTime, veval= intervalMeasure)
currentData
#> # A tibble: 3 × 2
#> time veval
#> <dbl> <dbl>
#> 1 1999 66.5
#> 2 2000 NA
#> 3 2001 87.2
resImputed <- impute_dataset(currentData,
countries = "veval",
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[2])
resImputed
#> $res
#> # A tibble: 3 × 2
#> time veval
#> <dbl> <dbl>
#> 1 1999 66.5
#> 2 2000 76.8
#> 3 2001 87.2
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
If several missing values are present in a row
intervalTime <- c(1999,2000,2001,2002,2003)
intervalMeasure <- c( 66.5, NA,NA,NA,87.2)
currentData <- tibble(time= intervalTime, veval= intervalMeasure)
currentData
#> # A tibble: 5 × 2
#> time veval
#> <dbl> <dbl>
#> 1 1999 66.5
#> 2 2000 NA
#> 3 2001 NA
#> 4 2002 NA
#> 5 2003 87.2
resImputed <- impute_dataset(currentData,
countries = "veval",
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[2])
tmp <- as.data.frame(currentData[ c(1,5),] )
tmp2 <- as.data.frame(resImputed$res[2:4,] )
resImputed
#> $res
#> # A tibble: 5 × 2
#> time veval
#> <dbl> <dbl>
#> 1 1999 66.5
#> 2 2000 71.7
#> 3 2001 76.9
#> 4 2002 82.0
#> 5 2003 87.2
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
It may be of interest to assume that part of the variability observed in a country on a given index is not structural, i.e. not due to causal determinants by to transient fluctuations. Furthermore, the interest here is not directed towards prediction but on smoothing values observed in the whole considered time interval.
In such a case a smoothing procedure remove sudden large changes showing a less variable time serie than the original.
Given that here short time series (panel data) are considered, a three points weighted average is proposed. The smoother substitutes an original raw value ym,i,t of country m indicator i at time t with the weighted average ˇym,i,t=ym,i,t−1 (1−w)/2+w ym,i,t+ym,i,t+1 (1−w)/2 where 0<w≤1. The special case w=1 corresponds to no smoothing. In case of missing values an NA is returned. If the weight is outside the interval (0,1] then a NA is returned. The first and last values are smoothed using weights w and 1−w.
After loading data, imputation takes place and finally smoothing is performed. Now, countries IT and DE are considered to illustrate the procedure. First check if missing values are present:
workTB <- dplyr::select(emp_20_64_MS, time, IT,DE)
check_data(workTB)
#> $res
#> [1] TRUE
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
thus checking is passed, so we go with the smoothing step after deleting the time variable:
resSM <- smoo_dataset(select(workTB,-time), leadW = 0.149, timeTB= select(workTB,time))
resSM
#> # A tibble: 17 × 3
#> time IT DE
#> <dbl> <dbl> <dbl>
#> 1 2002 60.0 68.5
#> 2 2003 60.4 68.4
#> 3 2004 60.9 68.8
#> 4 2005 62.0 69.5
#> 5 2006 62.1 71.1
#> 6 2007 62.7 72.6
#> 7 2008 62.3 73.6
#> 8 2009 61.9 74.5
#> 9 2010 61.3 75.3
#> 10 2011 61.0 76.0
#> 11 2012 60.4 76.9
#> 12 2013 60.3 77.3
#> 13 2014 60.1 77.7
#> 14 2015 60.7 78.1
#> 15 2016 61.4 78.6
#> 16 2017 62.3 79.2
#> 17 2018 62.4 79.3
and for a comparison:
tmpSM <- dplyr::rename(dplyr::select(resSM,-time),IT1=IT,DE1=DE)
compaTB <- dplyr::select(bind_cols(workTB, tmpSM), time,IT,IT1,DE,DE1)
compaTB
#> # A tibble: 17 × 5
#> time IT IT1 DE DE1
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 59.2 60.0 68.8 68.5
#> 2 2003 60.1 60.4 68.4 68.4
#> 3 2004 61.7 60.9 67.9 68.8
#> 4 2005 61.5 62.0 69.4 69.5
#> 5 2006 62.4 62.1 71.1 71.1
#> 6 2007 62.7 62.7 72.9 72.6
#> 7 2008 62.9 62.3 74 73.6
#> 8 2009 61.6 61.9 74.2 74.5
#> 9 2010 61 61.3 75 75.3
#> 10 2011 61 61.0 76.5 76.0
#> 11 2012 60.9 60.4 76.9 76.9
#> 12 2013 59.7 60.3 77.3 77.3
#> 13 2014 59.9 60.1 77.7 77.7
#> 14 2015 60.5 60.7 78 78.1
#> 15 2016 61.6 61.4 78.6 78.6
#> 16 2017 62.3 62.3 79.2 79.2
#> 17 2018 63 62.4 79.9 79.3
A graphical output shows changes for “IT”, with original index in blue and smoothed index in red:
qplot(time,IT,data=compaTB) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=IT1),colour="red") +
geom_point(aes(x=time,y=IT1),colour="red",shape=8)
Similarly for Germany, i.e. “DE”:
qplot(time,DE,data=compaTB) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=DE1),colour="red") +
geom_point(aes(x=time,y=DE1),colour="red",shape=8)
A weight equal to 1 leaves data unchanged:
resSM <- smoo_dataset(dplyr::select(workTB,-time), leadW = 1,
timeTB= dplyr::select(workTB,time))
resSM <- dplyr::rename(resSM,IT1=IT, DE1=DE)
compaTB <- dplyr::select(dplyr::bind_cols(workTB,
dplyr::select(resSM,-time)), time,IT,IT1,DE,DE1)
qplot(time,IT,data=compaTB) +
geom_line(colour="navyblue") +
geom_line(aes(x=time,y=IT1),colour="red") +
geom_point(aes(x=time,y=IT1),colour="red",shape=8)
A time window larger than 3 could be considered, but deep thoughts are recommended on how much economic and social changes may happen in 5 consecutive years.
Several alternative smoothing algorithm are available in R. Classical ma smoothers are also available from the caTools package.
The emp_20_64_MS dataset is now chosen for example, first with Italy and then with Germany as member states of interest.
data(emp_20_64_MS)
cuTB <- dplyr::tibble(ITori =emp_20_64_MS$IT)
cuTB <- dplyr::mutate(cuTB,time =emp_20_64_MS$time)
At the beginning and end of this series values are averages on smaller and smaller number of observations on the tails:
cuTB <- dplyr:: mutate(cuTB, IT_k_3= caTools::runmean(emp_20_64_MS$IT, k=3,
alg=c("C", "R", "fast", "exact")[4],
endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
align = c("center", "left", "right")[1]))
cuTB <- dplyr:: mutate(cuTB, IT_k_5= caTools::runmean(emp_20_64_MS$IT, k=5,
alg=c("C", "R", "fast", "exact")[4],
endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
align = c("center", "left", "right")[1]))
cuTB <- dplyr:: mutate(cuTB, IT_k_7= caTools::runmean(emp_20_64_MS$IT, k=7,
alg=c("C", "R", "fast", "exact")[4],
endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
align = c("center", "left", "right")[1]))
myG <- ggplot(cuTB,aes(x=time,y=ITori))+geom_line()+geom_point()+
geom_line(aes(x=time,y=IT_k_3),colour="red")+
geom_point(aes(x=time,y=IT_k_3),colour="red")+
#
geom_line(aes(x=time,y=IT_k_5),colour="blue")+
geom_point(aes(x=time,y=IT_k_5),colour="blue")+
#
geom_line(aes(x=time,y=IT_k_7),colour="orange")+
geom_point(aes(x=time,y=IT_k_7),colour="orange")+
theme(legend.position = c(.5, .5),
legend.title = element_text(face = "bold"))
myG
For Germany, a similar implementation provides the following result:
cuTB <- dplyr::mutate(cuTB, DEori =emp_20_64_MS$DE)
cuTB <- dplyr:: mutate(cuTB, DE_k_3= runmean(emp_20_64_MS$DE, k=3,
alg=c("C", "R", "fast", "exact")[4],
endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
align = c("center", "left", "right")[1]))
cuTB <- dplyr:: mutate(cuTB, DE_k_5= runmean(emp_20_64_MS$DE, k=5,
alg=c("C", "R", "fast", "exact")[4],
endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
align = c("center", "left", "right")[1]))
cuTB <- dplyr:: mutate(cuTB, DE_k_7= runmean(emp_20_64_MS$DE, k=7,
alg=c("C", "R", "fast", "exact")[4],
endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
align = c("center", "left", "right")[1]))
myG <- ggplot(cuTB,aes(x=time,y=DEori))+geom_line()+geom_point()+
geom_line(aes(x=time,y=DE_k_3),colour="red")+
geom_point(aes(x=time,y=DE_k_3),colour="red")+
#
geom_line(aes(x=time,y=DE_k_5),colour="blue")+
geom_point(aes(x=time,y=DE_k_5),colour="blue")+
#
geom_line(aes(x=time,y=DE_k_7),colour="orange")+
geom_point(aes(x=time,y=DE_k_7),colour="orange")+
theme(legend.position = c(.5, .5),
legend.title = element_text(face = "bold"))
myG
The time serie is so short that at k=7 a lot of observations are smoothed with different number of observations (shorter at start and end).
The above calculations are performed by a function in the convergEU package:
cuTB <- emp_20_64_MS[,c("time","IT","DE")]
ma_dataset(cuTB, kappa=3, timeName= "time")
#> $res
#> # A tibble: 17 × 3
#> time IT DE
#> <dbl> <dbl> <dbl>
#> 1 2002 59.2 68.8
#> 2 2003 60.3 68.4
#> 3 2004 61.1 68.6
#> 4 2005 61.9 69.5
#> 5 2006 62.2 71.1
#> 6 2007 62.7 72.7
#> 7 2008 62.4 73.7
#> 8 2009 61.8 74.4
#> 9 2010 61.2 75.2
#> 10 2011 61.0 76.1
#> 11 2012 60.5 76.9
#> 12 2013 60.2 77.3
#> 13 2014 60.0 77.7
#> 14 2015 60.7 78.1
#> 15 2016 61.5 78.6
#> 16 2017 62.3 79.2
#> 17 2018 63 79.9
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
that is a bit less flexible but it produced standard results.
The basis of scoreboard are raw values of an indicator (level, ym,i,t) for MS m at time t for indicator i. Differences among subsequent years (change) are as well important, namely ym,i,t−ym,i,t−1 thus a function to calculate these values may be exploited.
Let’s consider the dataset emp_20_64_MS, to calculate such quantities we do the following:
data(emp_20_64_MS)
resTB <- scoreb_yrs(emp_20_64_MS,timeName = "time")
resTB
#> $res
#> $res$sigma_conv
#> # A tibble: 17 × 9
#> time stdDev CV mean devianceT elle1in elle1su elle2in elle2su
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 6.34 0.0939 67.5 1125. 64.3 70.7 61.2 73.9
#> 2 2003 5.95 0.0878 67.8 991. 64.8 70.7 61.8 73.7
#> 3 2004 5.70 0.0839 67.9 909. 65.1 70.8 62.2 73.6
#> 4 2005 5.54 0.0809 68.4 858. 65.7 71.2 62.9 74.0
#> 5 2006 5.57 0.0801 69.6 869. 66.8 72.3 64.0 75.1
#> 6 2007 5.47 0.0775 70.6 838. 67.9 73.3 65.1 76.1
#> 7 2008 5.36 0.0755 71.0 804. 68.3 73.7 65.6 76.3
#> 8 2009 5.03 0.0730 69.0 710. 66.5 71.5 64.0 74.0
#> 9 2010 5.24 0.0769 68.1 768. 65.5 70.7 62.9 73.4
#> 10 2011 5.59 0.0821 68.1 875. 65.3 70.9 62.5 73.7
#> 11 2012 5.98 0.0880 68 1002. 65.0 71.0 62.0 74.0
#> 12 2013 6.28 0.0922 68.0 1103. 64.9 71.2 61.8 74.3
#> 13 2014 5.98 0.0867 69.0 1000. 66.0 72.0 63.0 75.0
#> 14 2015 5.74 0.0820 70.0 922. 67.1 72.9 64.2 75.7
#> 15 2016 5.60 0.0789 71.0 879. 68.2 73.8 65.4 76.6
#> 16 2017 5.37 0.0741 72.5 808. 69.8 75.2 67.1 77.9
#> 17 2018 5.30 0.0717 73.8 786. 71.2 76.5 68.6 79.1
#>
#> $res$sco_level
#> # A tibble: 17 × 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 2002 4 3 1 5 4 3 5 3 2 2 4 3
#> 2 2003 4 2 1 5 4 3 5 3 2 2 4 3
#> 3 2004 3 3 1 5 3 3 5 3 2 3 4 3
#> 4 2005 3 3 1 5 3 3 5 4 2 3 4 3
#> 5 2006 3 2 2 5 3 3 5 5 2 3 4 3
#> 6 2007 3 2 3 5 3 3 5 5 2 3 4 3
#> 7 2008 4 2 3 5 3 4 5 5 2 3 4 3
#> 8 2009 4 3 3 5 3 5 5 3 2 2 4 3
#> 9 2010 5 3 2 5 3 5 5 3 2 1 4 3
#> 10 2011 5 3 2 4 4 5 5 3 1 1 5 3
#> 11 2012 5 3 2 3 4 5 5 4 1 1 5 3
#> 12 2013 5 3 2 3 4 5 4 4 1 1 4 3
#> 13 2014 4 3 2 3 4 5 4 4 1 1 4 3
#> 14 2015 4 3 2 3 4 5 4 5 1 1 4 3
#> 15 2016 4 2 2 3 5 5 4 4 1 1 3 3
#> 16 2017 4 2 3 3 5 5 4 5 1 1 3 3
#> 17 2018 3 2 3 3 5 5 4 5 1 1 3 3
#> # ℹ 16 more variables: HR <int>, HU <int>, IE <int>, IT <int>, LT <int>,
#> # LU <int>, LV <int>, MT <int>, NL <int>, PL <int>, PT <int>, RO <int>,
#> # SE <int>, SI <int>, SK <int>, UK <int>
#>
#> $res$sco_change
#> # A tibble: 17 × 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 NA NA NA NA NA NA NA NA NA NA NA NA
#> 2 2003 3 3 5 3 2 2 1 4 4 4 2 4
#> 3 2004 1 4 5 3 2 2 3 4 3 4 3 2
#> 4 2005 5 3 3 1 3 4 2 5 3 5 3 3
#> 5 2006 3 1 5 3 2 4 3 5 3 3 3 1
#> 6 2007 3 3 5 3 3 4 1 3 2 3 3 2
#> 7 2008 4 3 5 2 3 4 2 3 3 1 4 3
#> 8 2009 4 3 3 3 3 4 3 1 4 1 3 3
#> 9 2010 5 5 1 3 3 5 3 1 2 3 3 4
#> 10 2011 3 3 1 2 3 4 3 5 1 3 4 3
#> 11 2012 3 3 3 1 3 3 3 5 1 1 3 3
#> 12 2013 3 3 3 1 4 3 3 4 1 2 2 3
#> 13 2014 1 2 4 2 3 2 2 3 2 3 1 1
#> 14 2015 1 1 5 2 3 2 3 5 4 5 1 2
#> 15 2016 2 2 2 3 5 2 2 1 3 5 2 2
#> 16 2017 1 2 5 4 3 1 1 4 3 3 2 1
#> 17 2018 2 3 3 5 3 1 2 2 4 3 5 1
#> # ℹ 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#>
#> $res$sco_level_num
#> # A tibble: 17 × 29
#> time AT BE BG CY CZ DE DK EE EL ES FI FR
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 0.5 0 -1 1 0.5 0 1 0 -0.5 -0.5 0.5 0
#> 2 2003 0.5 -0.5 -1 1 0.5 0 1 0 -0.5 -0.5 0.5 0
#> 3 2004 0 0 -1 1 0 0 1 0 -0.5 0 0.5 0
#> 4 2005 0 0 -1 1 0 0 1 0.5 -0.5 0 0.5 0
#> 5 2006 0 -0.5 -0.5 1 0 0 1 1 -0.5 0 0.5 0
#> 6 2007 0 -0.5 0 1 0 0 1 1 -0.5 0 0.5 0
#> 7 2008 0.5 -0.5 0 1 0 0.5 1 1 -0.5 0 0.5 0
#> 8 2009 0.5 0 0 1 0 1 1 0 -0.5 -0.5 0.5 0
#> 9 2010 1 0 -0.5 1 0 1 1 0 -0.5 -1 0.5 0
#> 10 2011 1 0 -0.5 0.5 0.5 1 1 0 -1 -1 1 0
#> 11 2012 1 0 -0.5 0 0.5 1 1 0.5 -1 -1 1 0
#> 12 2013 1 0 -0.5 0 0.5 1 0.5 0.5 -1 -1 0.5 0
#> 13 2014 0.5 0 -0.5 0 0.5 1 0.5 0.5 -1 -1 0.5 0
#> 14 2015 0.5 0 -0.5 0 0.5 1 0.5 1 -1 -1 0.5 0
#> 15 2016 0.5 -0.5 -0.5 0 1 1 0.5 0.5 -1 -1 0 0
#> 16 2017 0.5 -0.5 0 0 1 1 0.5 1 -1 -1 0 0
#> 17 2018 0 -0.5 0 0 1 1 0.5 1 -1 -1 0 0
#> # ℹ 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> # LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> # SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#>
#>
#> $msg
#> NULL
#>
#> $err
#> NULL
where the result is a list of three components: the summary statistics, the numerical labels to indicate the interval of the partition a level belongs to, the interval of the partition a change belongs to.
Numerical labels are assigned as follows (see DRAFT JOINT EMPLOYMENT
REPORT FROM THE COMMISSION AND THE COUNCIL):
* value −1 if a the original level
or change is y≤m−1⋅s;
* value −0.5 if a the original
level or change is m−1⋅s<y≤m−0.5⋅s;
* value 0 if a the original level
or change is m−0.5⋅s<y≤m+0.5⋅s;
* value +0.5 if a the original
level or change is m+0.5⋅s<y≤m+1⋅s;
* value 1 if a the original level
or change is y>m+1⋅s.
We note that there is the possibility of representing the above summaries as coloured plots (TO DO) into scoreboards.
For the comparison of a country with the EU average, the following steps are recommended, from raw data:
# library(ggplot2)
data(emp_20_64_MS)
selectedCountry <- "IT"
timeName <- "time"
myx_angle <- 45
outSig <- sigma_conv(emp_20_64_MS, timeName = timeName,
time_0=2002,time_t=2016)
miniY <- min(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])
maxiY <- max(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])
estrattore<- emp_20_64_MS[[timeName]] >= 2002 & emp_20_64_MS[[timeName]] <= 2016
ttmp <- cbind(outSig$res, dplyr::select(emp_20_64_MS[estrattore,], -contains(timeName)))
myG2 <-
ggplot(ttmp) + ggtitle(
paste("EU average (black, solid) and country",selectedCountry ," (red, dotted)") )+
geom_line(aes(x=ttmp[,timeName], y =ttmp[,"mean"]),colour="black") +
geom_point(aes(x=ttmp[,timeName],y =ttmp[,"mean"]),colour="black") +
# geom_line()+geom_point()+
ylim(c(miniY,maxiY)) + xlab("Year") +ylab("Indicator") +
theme(legend.position = "none")+
# add countries
geom_line( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red"),linetype="dotted") +
geom_point( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red")) +
ggplot2::scale_x_continuous(breaks = ttmp[,timeName],
labels = ttmp[,timeName]) +
ggplot2::theme(
axis.text.x=ggplot2::element_text(
#size = ggplot2::rel(myfont_scale ),
angle = myx_angle
#vjust = 1,
#hjust=1
))
myG2
It is also possible to graphically show departures in terms of the above defined partition:
obe_lvl <- scoreb_yrs(emp_20_64_MS,timeName = timeName)$res$sco_level_num
# select subset of time
estrattore <- obe_lvl[[timeName]] >= 2009 & obe_lvl[[timeName]] <= 2016
scobelvl <- obe_lvl[estrattore,]
my_MSstd <- ms_dynam( scobelvl,
timeName = "time",
displace = 0.25,
displaceh = 0.45,
dimeFontNum = 3,
myfont_scale = 1.35,
x_angle = 45,
axis_name_y = "Countries",
axis_name_x = "Time",
alpha_color = 0.9
)
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the convergEU package.
#> Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
my_MSstd
The counvergEU package provides a function that automatically prepares one or more country fiches. This function is able to create a directory along an existing path and to copy the rmarkdown file representing the template within it. The rmarkdown file is parameterized so that passing different parameters the compilation takes place with different data, say different indicators and countries.
It is very important to prepare complete data in a tibble (dataset) made by a time variable and as many other variables as countries that enter into the calculation of the time average. Failing to satisfy this requisite causes the use of a wrong mean value at each year. Nevertheless one key country is specified and some other countries of interest may be listed to better decorate graphs and compare performances.
Below, a call to the function go_ms_fi() illustrates the syntax:
go_ms_fi(
workDF ='myTB',
countryRef ='DE',
otherCountries = "c('IT','UK','FR')",
time_0 = 2002,
time_t = 2016,
tName = 'time',
indiType = "highBest",
aggregation= 'EU27_2020',
x_angle= 45,
dataNow= Sys.time(),
author = 'A.Student',
outFile = 'Germany-up2-2016',
outDir = "tt-fish",
indiName= 'emp_20_64_MS',
memstates='quintiles'
)
but it is very important to emphasize some constraints and unusual
ways to pass parameters to such a function. In fact, note that the first
argument is the working dataset which is passed not as an R object but
as a string, the name of the dataset that must be available in the R
workspace before invoking go_ms_fi.
The second argument countryRef is a string with the short name
of a member country that will be shown in one-country plots. Less
obvious, argument indiType = “lowBest” specifies if the
considered indicator is built so that a low value is good for a country
or if a high value is good (indiType = “highBest”).
Of particular importance the argument outFile that can be a string indicating the name of the output file. Similarly outDir is the path (unit and folders) in which the final compiled html will be stored. The syntax of the path depend on the operating system; for example outDir=‘F:/analysis/IT2018’ indicates that in the usb disk called ‘F’, within the folder ‘analysis’ is located folder ‘IT2018’ where R will write the country fiche. Note that a disk called ‘F’ must exist and also folder ‘analysis’ must exist in such unit, while on the contrary folder ‘IT2018’ is created by the function if it does not already exist.
Within the above mentioned output directory, besides the compiled html, it is also stored a file called like specified by outFile but with added the string ‘-workspace.RData’ that contains data and plots produced during the compilation of the country fiche for further subsequent use in other technical reports.
An auxiliary function go_indica_fi() is provided in the R package convergEU to produce an indicator fiches, where the output is an html file. At this purpose, an output directory must be also specified. Note that some arguments are passed as strings instead of objects, as described in the last section above.
An example of syntax to invoke the procedure is:
go_indica_fi(
time_0 = 2005,
time_t = 2010,
timeName = 'time',
workingDF = 'emp_20_64_MS' ,
indicaT = 'emp_20_64',
indiType = c('highBest','lowBest')[1],
seleMeasure = 'all',
seleAggre = 'EU27_2020',
x_angle = 45,
data_res_download = FALSE,
auth = 'A.Student',
dataNow = '2019/05/16',
outFile = "test_IT-emp_20_64_MS",
outDir = "tt-fish",
memstates='quintiles'
)
The following reference may be consulted for details:
Brussels, 21.11.2018, COM(2018) 761 final, DRAFT JOINT EMPLOYMENT REPORT FROM THE COMMISSION AND THE COUNCIL, accompanying the Communication from the Commission on the Annual Growth Survey 2019.
Eurofound (2018), Upward convergence in the EU: Concepts, measurements and indicators, Publications Office of the European Union, Luxembourg; by: Massimiliano Mascherini, Martina Bisello, Hans Dubois and Franz Eiffe.
Tuszynski, J. (2015). caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc. R package version 1.17.1.2, URL https://CRAN.R-project.org/package=caTools.
Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini (2020) Tutorial: analysis of convergence with the convergEU package. Package vignette URL https://www.eurofound.europa.eu/system/files/2022-04/introduction-to-the-convergeu-package-0.6.4-tutorial-v2-apr2022.pdf
In this appendix several lists of member states are defined as follows:
setupConvergEU <- convergEU_glb()
names(setupConvergEU)
#> [1] "EUcodes" "EA" "EA19" "EU12"
#> [5] "EU15" "EU25" "EU27_2007" "EU27_2019"
#> [9] "EU27_2020" "EU27" "EU28" "geoRefEUF"
#> [13] "metaEUStat" "tmpl_out" "paralintags" "rounDigits"
#> [17] "epsilonV" "scoreBoaTB" "labels_clusters"
and, with more details:
print(setupConvergEU$EUcodes,n=30)
#> # A tibble: 28 × 4
#> pae paeF paeN paeS
#> <chr> <chr> <dbl> <chr>
#> 1 1-AT 1-AT 1 AT
#> 2 10-ES 10-ES 10 ES
#> 3 11-FI 11-FI 11 FI
#> 4 12-FR 12-FR 12 FR
#> 5 13-HR 13-HR 13 HR
#> 6 14-HU 14-HU 14 HU
#> 7 15-IE 15-IE 15 IE
#> 8 16-IT 16-IT 16 IT
#> 9 17-LT 17-LT 17 LT
#> 10 18-LU 18-LU 18 LU
#> 11 19-LV 19-LV 19 LV
#> 12 2-BE 2-BE 2 BE
#> 13 20-MT 20-MT 20 MT
#> 14 21-NL 21-NL 21 NL
#> 15 22-PL 22-PL 22 PL
#> 16 23-PT 23-PT 23 PT
#> 17 24-RO 24-RO 24 RO
#> 18 25-SE 25-SE 25 SE
#> 19 26-SI 26-SI 26 SI
#> 20 27-SK 27-SK 27 SK
#> 21 28-UK 28-UK 28 UK
#> 22 3-BG 3-BG 3 BG
#> 23 4-CY 4-CY 4 CY
#> 24 5-CZ 5-CZ 5 CZ
#> 25 6-DE 6-DE 6 DE
#> 26 7-DK 7-DK 7 DK
#> 27 8-EE 8-EE 8 EE
#> 28 9-EL 9-EL 9 EL
print(setupConvergEU$Eurozone)
#> NULL
setupConvergEU$EU12
#> $dates
#> [1] "01-11-1993" "31/12/1994"
#>
#> $memberStates
#> # A tibble: 12 × 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> 11 Spain ES
#> 12 United-Kingdom UK
setupConvergEU$EU15
#> $dates
#> [1] "01-01-1995" "30/04/2004"
#>
#> $memberStates
#> # A tibble: 15 × 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> 11 Spain ES
#> 12 United-Kingdom UK
#> 13 Austria AT
#> 14 Finland FI
#> 15 Sweden SE
print(setupConvergEU$EU25$dates)
#> [1] "01/05/2004" "31/12/2006"
print(setupConvergEU$EU25$memberStates,n=30)
#> # A tibble: 25 × 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> 11 Spain ES
#> 12 United-Kingdom UK
#> 13 Austria AT
#> 14 Finland FI
#> 15 Sweden SE
#> 16 Cyprus CY
#> 17 Czech-Republic CZ
#> 18 Estonia EE
#> 19 Hungary HU
#> 20 Latvia LV
#> 21 Lithuania LT
#> 22 Malta MT
#> 23 Poland PL
#> 24 Slovakia SK
#> 25 Slovenia SI
print(setupConvergEU$EU27$dates)
#> [1] "01/02/2020" "00/00/0000"
print(setupConvergEU$EU27$memberStates,n=30)
#> # A tibble: 27 × 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> 11 Spain ES
#> 12 Austria AT
#> 13 Finland FI
#> 14 Sweden SE
#> 15 Cyprus CY
#> 16 Czech-Republic CZ
#> 17 Estonia EE
#> 18 Hungary HU
#> 19 Latvia LV
#> 20 Lithuania LT
#> 21 Malta MT
#> 22 Poland PL
#> 23 Slovakia SK
#> 24 Slovenia SI
#> 25 Bulgaria BG
#> 26 Romania RO
#> 27 Croatia HR
print(setupConvergEU$EU27_2020$dates)
#> [1] "01/02/2020" "00/00/0000"
print(setupConvergEU$EU27_2020$memberStates,n=30)
#> # A tibble: 27 × 2
#> MS codeMS
#> <chr> <chr>
#> 1 Belgium BE
#> 2 Denmark DK
#> 3 France FR
#> 4 Germany DE
#> 5 Greece EL
#> 6 Ireland IE
#> 7 Italy IT
#> 8 Luxembourg LU
#> 9 Netherlands NL
#> 10 Portugal PT
#> 11 Spain ES
#> 12 Austria AT
#> 13 Finland FI
#> 14 Sweden SE
#> 15 Cyprus CY
#> 16 Czech-Republic CZ
#> 17 Estonia EE
#> 18 Hungary HU
#> 19 Latvia LV
#> 20 Lithuania LT
#> 21 Malta MT
#> 22 Poland PL
#> 23 Slovakia SK
#> 24 Slovenia SI
#> 25 Bulgaria BG
#> 26 Romania RO
#> 27 Croatia HR