--- title: "widr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with widr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE) ``` **widr** provides direct API access to the [World Inequality Database](https://wid.world) (WID) from R. It offers validated variable codes, structured downloads as standard data frames, and helpers for currency conversion, inequality measurement, and plotting. Independent implementation, unaffiliated with the World Inequality Lab (WIL) or the Paris School of Economics. Data are sourced from WID and maintained by WIL. ## Installation ```{r install} install.packages("widr") # Development version remotes::install_github("cherylisabella/widr") ``` ## Variable codes WID variables follow a four-part grammar: ``` [] [] ``` | Component | Width | Example | Meaning | |-----------|-------|---------|---------| | `type` | 1 letter | `s` | share | | `concept` | 5-6 letters | `ptinc` | pre-tax national income | | `age` | 3 digits | `992` | adults 20+ | | `pop` | 1 letter | `j` | equal-split between spouses | `sptinc992j` denotes the **share** of **pre-tax national income** for **equal-split adults aged 20+**. The full catalogue is available at [World Inequality Database](https://wid.world/codes-dictionary/); widr bundles it as six searchable reference tables. ```{r codes} wid_search("national income") # keyword search across concepts wid_decode("sptinc992j") # parse into components wid_encode("s", "ptinc", age = "992", pop = "j") # build from components wid_is_valid(series_type = "s", concept = "ptinc") # non-throwing validation ``` The six reference tables (`wid_series_types`, `wid_concepts`, `wid_ages`, `wid_pop_types`, `wid_percentiles`, `wid_countries`) are lazy-loaded and compiled from the codes dictionary by an independent script. ## Downloading data `download_wid()` returns a `wid_df`, a classed `data.frame` fully compatible with dplyr, ggplot2, and base R. At minimum supply `indicators` or `areas`; all other parameters default to `"all"` (age to `"992"`, pop to `"j"`). ```{r download} library(widr) # Top 1% pre-tax income share, United States, 2000-2022 top1 <- download_wid( indicators = "sptinc992j", areas = "US", perc = "p99p100", years = 2000:2022 ) top1 #> 23 rows | 1 countries | 1 variables #> country variable percentile year value age pop #> 1 US sptinc992j p99p100 2000 0.168 992 j #> ... ``` Data is retrieved from the WID webservice at `https://rfap9nitz6.execute-api.eu-west-1.amazonaws.com/prod`. ### Multiple countries and percentiles ```{r multi} shares <- download_wid( indicators = "sptinc992j", areas = c("US", "FR", "DE", "CN"), perc = c("p90p100", "p99p100"), years = 1980:2022 ) ``` ### Excluding interpolated points Many series are linearly interpolated between survey years. Pass `include_extrapolations = FALSE` to retain only directly observed observations: ```{r extrap} download_wid("sptinc992j", areas = "MZ", include_extrapolations = FALSE) ``` ### Source metadata `metadata = TRUE` attaches source and methodological documentation as an attribute — the shape of the data frame is unchanged: ```{r meta} result <- download_wid("sptinc992j", areas = "US", metadata = TRUE) attr(result, "wid_meta") #> variable country source method quality imputation #> 1 sptinc992j US Tax records DFL high adjusted surveys ``` ### Key parameters | Parameter | Default | Description | |-----------|---------|-------------| | `indicators` | `"all"` | Variable codes | | `areas` | `"all"` | ISO-2 country / region codes | | `years` | `"all"` | Integer vector or `"all"` | | `perc` | `"all"` | Percentile codes, e.g. `"p99p100"` | | `ages` | `"992"` | Three-digit age code | | `pop` | `"j"` | Population unit | | `metadata` | `FALSE` | Attach source info as `attr(., "wid_meta")` | | `include_extrapolations` | `TRUE` | Include interpolated points | | `cache` | `TRUE` | Cache responses to disc | | `verbose` | `FALSE` | Print progress messages | ## Tidyverse integration `wid_df` is a plain `data.frame` subclass; dplyr verbs and ggplot2 work without any unwrapping: ```{r tidy-pipe} library(dplyr) library(ggplot2) top1 |> wid_tidy(country_names = FALSE) |> filter(year >= 1990) |> ggplot(aes(year, value)) + geom_line(colour = "#58a6ff", linewidth = 0.9) + scale_y_continuous(labels = scales::percent_format()) + labs(title = "Top 1% pre-tax income share - United States", x = NULL, y = NULL) + theme_minimal() ``` `wid_tidy()` coerces `year` to integer and `value` to double, and optionally appends `indicator`, `series_type`, `type_label`, and `country_name` columns. ## Reusable query objects `wid_query()` builds a query; `wid_filter()` updates it; `wid_fetch()` executes it. Useful when iterating over parameter combinations or embedding in analysis pipelines: ```{r query} q <- wid_query(indicators = "sptinc992j", areas = c("US", "FR"), cache = FALSE) q <- wid_filter(q, years = 2010:2022) wid_fetch(q) ``` ## Caching All responses are cached to disc by default, keyed to the exact query parameters and persisting across sessions: ```{r cache} wid_cache_list() # list cached queries wid_cache_clear() # remove all ``` ## Currency conversion Monetary series (types `a`, `m`, `t`) are in local currency at the prior year's prices. `wid_convert()` fetches the appropriate WID exchange-rate series and divides in one step. Dimensionless series (types `s`, `g`, etc.) pass through unchanged with a message. ```{r convert} # Bottom 50% average income, four countries - convert to 2022 USD PPP download_wid("aptinc992j", areas = c("US", "FR", "CN", "IN"), perc = "p0p50") |> wid_convert(target = "ppp", base_year = "2022") ``` Supported targets: `"lcu"` (no conversion), `"usd"`, `"eur"`, `"gbp"`, `"ppp"`, `"yppp"`. ## Inequality measures These operate on data already in memory; no additional API calls are needed. ### Gini coefficient Requires a share (`s`) series with contiguous `pXpY` codes covering the full distribution: ```{r gini} dist <- download_wid("sptinc992j", areas = c("US", "FR"), perc = "all", years = 1990:2022) wid_gini(dist) #> country year gini #> 1 FR 1990 0.411 #> 2 US 1990 0.453 ``` ### Top fractile share ```{r top-share} wid_top_share(dist, top = 0.01) # top 1% wid_top_share(dist, top = 0.10) # top 10% ``` ### Percentile ratio Requires a threshold (`t`) series: ```{r perc-ratio} thresh <- download_wid("tptinc992j", areas = "US", perc = "all") wid_percentile_ratio(thresh) # P90/P10 wid_percentile_ratio(thresh, numerator = "p90", denominator = "p50") # P90/P50 ``` ## Plotting All plot functions return `ggplot` objects and accept additional layers: ```{r plot} # Time series - one line per country; facet = TRUE for separate panels wid_plot_timeseries(shares, country_labels = c(US = "United States", FR = "France", DE = "Germany", CN = "China")) # Cross-country bar chart for a single year wid_plot_compare(shares, year = 2020) # Lorenz curve wid_plot_lorenz(dist, country = "US") ``` ## Example ```{r full-example} library(widr); library(dplyr); library(ggplot2) download_wid( indicators = "aptinc992j", areas = c("US", "FR", "CN", "IN"), perc = "p0p50", years = 1990:2022 ) |> wid_convert(target = "ppp", base_year = "2022") |> wid_tidy(country_names = TRUE) |> ggplot(aes(year, value, colour = country_name)) + geom_line(linewidth = 0.8) + scale_y_continuous(labels = scales::dollar_format()) + labs(title = "Bottom 50% average pre-tax income", subtitle = "2022 USD PPP · equal-split adults 20+", x = NULL, y = NULL, colour = NULL) ``` ## Quick reference | Function | Purpose | |---|---| | `download_wid()` | Download data; returns a `wid_df` | | `wid_decode()` / `wid_encode()` | Parse or build variable codes | | `wid_validate()` / `wid_is_valid()` | Validate code components | | `wid_search()` | Keyword search across reference tables | | `wid_tidy()` | Decode columns, coerce types | | `wid_convert()` | Currency conversion | | `wid_metadata()` | Retrieve source information | | `wid_gini()` | Gini coefficient | | `wid_top_share()` | Top fractile income / wealth share | | `wid_percentile_ratio()` | Percentile ratio (e.g. P90/P10) | | `wid_plot_timeseries()` | Time-series line chart | | `wid_plot_compare()` | Cross-country bar / point chart | | `wid_plot_lorenz()` | Lorenz curve | | `wid_query()` / `wid_filter()` / `wid_fetch()` | Reusable query objects | | `wid_set_key()` | Set API key | | `wid_cache_list()` / `wid_cache_clear()` | Cache management | Full code dictionary: `vignette("code-dictionary")` · [wid.world/codes-dictionary](https://wid.world/codes-dictionary/)