--- title: "Introduction to phinterval" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{phinterval} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` ```{r setup} library(phinterval) library(lubridate, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) library(tidyr, warn.conflicts = FALSE) ``` # Introduction The phinterval package extends [{lubridate}](https://lubridate.tidyverse.org/) to support disjoint ("holey") and empty time spans. It implements the `` vector class, a generalization of the standard contiguous ``, which can represent: - **Contiguous spans:** A contiguous interval bounded by a start and end point (e.g., the year 2025). - **Empty spans:** A set containing no time points (e.g., the intersection of your life and Napoleon's). - **Disjoint spans:** A set of multiple time spans separated by gaps (e.g., the days you attended school, excluding weekends and holidays). This package is designed to easily integrate into existing lubridate workflows. Any `` vector can be converted to an equivalent `` vector using `as_phinterval()`, and all phinterval functions accept either `` or `` inputs. # When Time Isn't Continuous Certain set operations on time spans naturally produce empty or disjoint results, which are difficult to represent using a standard interval. This section illustrates several such edge cases using the months of January and November 2025, along with the full calendar year. ```{r} jan <- interval(ymd("2025-01-01"), ymd("2025-02-01")) nov <- interval(ymd("2025-11-01"), ymd("2025-12-01")) full_2025 <- interval(ymd("2025-01-01"), ymd("2026-01-01")) ``` ## Empty Intersections Because January and November do not overlap, their intersection should contain no time. ```{r} lubridate::intersect(jan, nov) phint_intersect(jan, nov) ``` In lubridate this is resolved by coercing the intersection to `NA`, while phinterval returns a ``, which explicitly represents an empty span of time. This distinction matters when performing downstream calculations. For example, counting the number of days contained in both January and November: ```{r} lubridate::intersect(jan, nov) / duration(days = 1) phint_intersect(jan, nov) / duration(days = 1) ``` ## Punching Holes in Intervals Next, consider subtracting the month of November from the full year of 2025. ```{r} try(lubridate::setdiff(full_2025, nov)) phint_setdiff(full_2025, nov) ``` The result is two disjoint spans, January through October and December, which can't be represented by a single interval. As a result, lubridate raises an error. In phinterval, the disjoint span is represented as a single object with an explicit gap. ## Unions of Non-Overlapping Spans Similarly, the union of January and November contains a gap from February to October. ```{r} lubridate::union(jan, nov) phint_union(jan, nov) ``` In this case lubridate returns the span from the beginning of January to the end of November, implicitly filling in the gap. The two disjoint months are represented explicitly using phinterval. ## Subtracting an Interval from Itself Finally, consider subtracting an interval from itself. Intuitively, this should result in an empty time span. ```{r} lubridate::setdiff(jan, jan) phint_setdiff(jan, jan) ``` In this case, lubridate returns the original interval, while phinterval returns a ``. # Case Study: Employment History The phinterval package is most useful when working with tabular data and vectorized workflows. To illustrate this, we’ll consider an abridged employment history for several characters from the television show *Succession*. ```{r} jobs <- tribble( ~name, ~job_title, ~start, ~end, "Greg", "Mascot", "2018-01-01", "2018-06-03", "Greg", "Executive Assistant", "2018-06-10", "2020-04-01", "Greg", "Chief of Staff", "2020-03-01", "2020-11-28", "Tom", "Chairman", "2019-05-01", "2020-11-10", "Tom", "CEO", "2020-11-10", "2020-12-31", "Shiv", "Political Consultant", "2017-01-01", "2019-04-01" ) ``` Suppose we know that Greg, Tom, and Shiv went on a Christmas vacation in December 2017. ```{r} vacation <- interval(ymd("2017-12-23"), ymd("2017-12-29")) ``` If we want to analyze only the time spent working, and exclude time on vacation, we might try to subtract the `vacation` interval from each span in `jobs`. However, this approach breaks down when the vacation falls strictly within a job interval, as it does for Shiv’s Political Consultant role. ```{r} try( jobs |> mutate( span = interval(start, end), span = setdiff(span, vacation) ) |> select(name, job_title, span) ) ``` Handling this correctly is surprisingly involved. One option is to split Shiv’s job into two rows (one pre-vacation and one post-vacation), breaking the one-row-per-job structure of the data. Another is to represent each job as a list of intervals, complicating downstream analysis. The main purpose of phinterval is to avoid these workarounds, by providing drop-in replacements for lubridate interval functions. Because phinterval functions accept either `` or `` inputs, existing code can typically be adapted by simply replacing a lubridate function with its phinterval counterpart. ```{r} jobs |> mutate( span = interval(start, end), span = phint_setdiff(span, vacation) ) |> select(name, job_title, span) ``` ## Merging Intervals Suppose we want to analyze only the total time each character spent employed, without distinguishing between individual jobs. This can be done using `phint_squash()`, which aggregates a vector of intervals into a minimal set of non-overlapping spans within a scalar ``. ```{r, include = FALSE} opts <- options(width = 120) ``` ```{r} employment <- jobs |> mutate(span = interval(start, end)) |> group_by(name) |> summarize(employed = phint_squash(span)) employment ``` Notice that: - *Greg* has multiple disjoint employment periods, which are preserved as separate spans within a single `` element. - *Tom* held two back-to-back positions (Chairman followed by CEO), which `phint_squash()` correctly merges into a single contiguous span. The `by` argument of `phint_squash()` and `datetime_squash()` (which takes `start` and `end` times directly) can be used in place of `dplyr::group_by()`. The example below is equivalent to the previous code but is usually several times faster. ```{r} datetime_squash( start = ymd(jobs$start), end = ymd(jobs$end), by = jobs$name, keep_by = TRUE, order_by = TRUE ) ``` ```{r, include = FALSE} options(opts) ``` As in `dplyr::summarize()`, the `by` argument can be a vector or data frame to support multiple grouping columns. To return the dataset to a one-row-per-span format, use `phint_unnest()`, which converts each `` element into separate rows: ```{r} employment |> reframe(phint_unnest(employed, key = name)) ``` ## Finding Gaps To analyze periods of unemployment, we need to identify the gaps between employment intervals. The `phint_invert()` function returns the gaps between spans in a ``. ```{r} unemployment <- employment |> mutate( # Find the gaps between jobs unemployed = phint_invert(employed), # Calculate duration of unemployment days_unemployed = unemployed / ddays(1) ) |> select(name, unemployed, days_unemployed) unemployment ``` Greg was unemployed for 7 days between his time as a Mascot and his role as Executive Assistant. Tom and Shiv have no gaps within their respective employment timelines, represented by a ``. # Edge Cases and Gotchas ## Abutting Intervals and Intersection Manipulating abutting intervals (intervals that share an endpoint) can produce sometimes unexpected results. To demonstrate, consider the time within a Monday and Tuesday in November 2025. ```{r} monday <- interval(ymd("2025-11-10"), ymd("2025-11-11")) tuesday <- interval(ymd("2025-11-11"), ymd("2025-11-12")) ``` By default, intervals in `` and `` vectors have inclusive endpoints, meaning that midnight on Monday, November 11th, 2025 falls within both `monday` and `tuesday`: ```{r} midnight_monday <- ymd_hms("2025-11-11 00:00:00") phint_within(midnight_monday, monday) phint_within(midnight_monday, tuesday) ``` As a result, the intersection of `monday` and `tuesday` is an instantaneous interval at `midnight_monday`. ```{r} phint_intersect(monday, tuesday) == as_phinterval(midnight_monday) ``` Perhaps surprisingly, this also means that the intersection of `monday` and its complement is not empty, but consists of the two endpoints of `monday`. ```{r} not_monday <- phint_complement(monday) not_monday phint_intersect(monday, not_monday) ``` The bounds argument in `phint_overlaps()`, `phint_within()`, and `phint_intersect()` controls this behavior. When `bounds = "()"`, endpoints are treated as exclusive: ```{r} phint_overlaps(monday, tuesday, bounds = "()") phint_intersect(monday, tuesday, bounds = "()") ``` With exclusive endpoints, `monday` and `tuesday` no longer overlap, and their intersection is empty. An instantaneous interval `(point, point)` with open bounds is mathematically undefined, but for convenience we allow these points to exist. With `bounds = "()"`, instants on the endpoint of an interval are outside of the interval, while instants in the middle of an interval are considered to be within it: ```{r} monday_at_9AM <- as_phinterval(ymd_hms("2025-11-10 00:09:00")) phint_within(monday_at_9AM, monday, bounds = "()") phint_within(midnight_monday, monday, bounds = "()") ``` To consider instantaneous intervals as empty, use `phint_sift()` to remove all instants from an interval vector: ```{r} phint <- phint_squash(c(monday_at_9AM, tuesday)) phint phint_sift(phint) ``` ## Instantaneous Intervals and Set Difference Because phinterval elements are composed of non-overlapping, non-adjacent spans, "punching" an instantaneous hole into an interval using `phint_setdiff()` has no effect on the interval. While removing a single point from an interval `[start, end]` would theoretically split it into `[start, point)` and `(point, end]`, in practice these adjacent pieces are immediately merged back together: ```{r} monday_noon <- as_phinterval(ymd_hms("2025-11-10 12:00:00")) monday_lunch_break <- interval( ymd_hms("2025-11-10 12:00:00"), ymd_hms("2025-11-10 13:00:00") ) phint_setdiff(monday, monday_lunch_break) # Removes a non-zero interval phint_setdiff(monday, monday_noon) # Instantaneous - no effect ``` To create gaps, you must remove an interval with non-zero duration. ## Time Zones To ensure that any `` vector can be represented as an equivalent `` vector, the `phinterval()` constructor accepts any time zone permitted by `interval()`, including unrecognized zones. ```{r} intvl <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "nozone") phint <- phinterval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "nozone") intvl == phint ``` When a `` with an unrecognized time zone is formatted, its time points are displayed using the UTC time zone: ```{r, include = FALSE} rlang::reset_warning_verbosity("phinterval_warning_unrecognized_tzone") ``` ```{r} print(phint) ``` The `is_recognized_tzone()` function can be used to check whether a time zone is recognized: ```{r} is_recognized_tzone("America/New_York") is_recognized_tzone("nozone") is_recognized_tzone(NA_character_) ``` Some datetime vectors, such as ``, are allowed to have an `NA` time zone. When converted to a ``, the missing time zone is silently replaced with UTC: ```{r} na_zoned <- as.POSIXct("2021-01-01", tz = NA_character_) as_phinterval(na_zoned) ``` Operations that combine two or more interval vectors, such as `phint_union()`, use the time zone of the first argument. If the first argument's time zone is `""` (the user's local time zone), the second argument's time zone is used instead. ```{r} int_est <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "EST") int_utc <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "UTC") int_lcl <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "") phint_union(int_est, int_utc) phint_union(int_utc, int_est) phint_union(int_lcl, int_est) ``` ## Comparison with Datetime Vectors Comparison operators (`<=`, `<`, `>`, `>=`, `==`) work in unexpected ways when comparing datetime vectors (``, ``, ``) to `` or `` vectors. For example: ```{r} span <- phinterval(ymd("2000-08-05"), ymd("2000-11-29")) date <- ymd("2021-01-01") span == date ``` For the intended behavior, use `as_phinterval()` to convert datetime vectors into an equivalent `` first. ```{r} span == as_phinterval(date) ```