library(phinterval)
library(lubridate, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(tidyr, warn.conflicts = FALSE)Introduction
The phinterval package extends {lubridate} to support
disjoint (“holey”) and empty time spans. It implements the
<phinterval> vector class, a generalization of the
standard contiguous <Interval>, which can
represent:
- Contiguous spans: A contiguous interval bounded by a start and end point (e.g., the year 2025).
- Empty spans: A set containing no time points (e.g., the intersection of your life and Napoleon’s).
- Disjoint spans: A set of multiple time spans separated by gaps (e.g., the days you attended school, excluding weekends and holidays).
This package is designed to easily integrate into existing lubridate
workflows. Any <Interval> vector can be converted to
an equivalent <phinterval> vector using
as_phinterval(), and all phinterval functions accept either
<Interval> or <phinterval>
inputs.
When Time Isn’t Continuous
Certain set operations on time spans naturally produce empty or disjoint results, which are difficult to represent using a standard interval. This section illustrates several such edge cases using the months of January and November 2025, along with the full calendar year.
jan <- interval(ymd("2025-01-01"), ymd("2025-02-01"))
nov <- interval(ymd("2025-11-01"), ymd("2025-12-01"))
full_2025 <- interval(ymd("2025-01-01"), ymd("2026-01-01"))Empty Intersections
Because January and November do not overlap, their intersection should contain no time.
lubridate::intersect(jan, nov)
#> [1] NA--NA
phint_intersect(jan, nov)
#> <phinterval<UTC>[1]>
#> [1] <hole>In lubridate this is resolved by coercing the intersection to
NA, while phinterval returns a <hole>,
which explicitly represents an empty span of time.
This distinction matters when performing downstream calculations. For example, counting the number of days contained in both January and November:
lubridate::intersect(jan, nov) / duration(days = 1)
#> [1] NA
phint_intersect(jan, nov) / duration(days = 1)
#> [1] 0Punching Holes in Intervals
Next, consider subtracting the month of November from the full year of 2025.
try(lubridate::setdiff(full_2025, nov))
#> Error in setdiff.Interval(full_2025, nov) :
#> Cases 1 result in discontinuous intervals.
phint_setdiff(full_2025, nov)
#> <phinterval<UTC>[1]>
#> [1] {2025-01-01--2025-11-01, 2025-12-01--2026-01-01}The result is two disjoint spans, January through October and December, which can’t be represented by a single interval. As a result, lubridate raises an error. In phinterval, the disjoint span is represented as a single object with an explicit gap.
Unions of Non-Overlapping Spans
Similarly, the union of January and November contains a gap from February to October.
lubridate::union(jan, nov)
#> [1] 2025-01-01 UTC--2025-12-01 UTC
phint_union(jan, nov)
#> <phinterval<UTC>[1]>
#> [1] {2025-01-01--2025-02-01, 2025-11-01--2025-12-01}In this case lubridate returns the span from the beginning of January to the end of November, implicitly filling in the gap. The two disjoint months are represented explicitly using phinterval.
Subtracting an Interval from Itself
Finally, consider subtracting an interval from itself. Intuitively, this should result in an empty time span.
lubridate::setdiff(jan, jan)
#> [1] 2025-01-01 UTC--2025-02-01 UTC
phint_setdiff(jan, jan)
#> <phinterval<UTC>[1]>
#> [1] <hole>In this case, lubridate returns the original interval, while
phinterval returns a <hole>.
Case Study: Employment History
The phinterval package is most useful when working with tabular data and vectorized workflows. To illustrate this, we’ll consider an abridged employment history for several characters from the television show Succession.
jobs <- tribble(
~name, ~job_title, ~start, ~end,
"Greg", "Mascot", "2018-01-01", "2018-06-03",
"Greg", "Executive Assistant", "2018-06-10", "2020-04-01",
"Greg", "Chief of Staff", "2020-03-01", "2020-11-28",
"Tom", "Chairman", "2019-05-01", "2020-11-10",
"Tom", "CEO", "2020-11-10", "2020-12-31",
"Shiv", "Political Consultant", "2017-01-01", "2019-04-01"
)Suppose we know that Greg, Tom, and Shiv went on a Christmas vacation in December 2017.
If we want to analyze only the time spent working, and exclude time
on vacation, we might try to subtract the vacation interval
from each span in jobs. However, this approach breaks down
when the vacation falls strictly within a job interval, as it does for
Shiv’s Political Consultant role.
try(
jobs |>
mutate(
span = interval(start, end),
span = setdiff(span, vacation)
) |>
select(name, job_title, span)
)
#> Error in mutate(jobs, span = interval(start, end), span = setdiff(span, :
#> ℹ In argument: `span = setdiff(span, vacation)`.
#> Caused by error in `setdiff.Interval()`:
#> ! Cases 6 result in discontinuous intervals.Handling this correctly is surprisingly involved. One option is to split Shiv’s job into two rows (one pre-vacation and one post-vacation), breaking the one-row-per-job structure of the data. Another is to represent each job as a list of intervals, complicating downstream analysis.
The main purpose of phinterval is to avoid these workarounds, by
providing drop-in replacements for lubridate interval functions. Because
phinterval functions accept either <Interval> or
<phinterval> inputs, existing code can typically be
adapted by simply replacing a lubridate function with its phinterval
counterpart.
jobs |>
mutate(
span = interval(start, end),
span = phint_setdiff(span, vacation)
) |>
select(name, job_title, span)
#> # A tibble: 6 × 3
#> name job_title span
#> <chr> <chr> <phint<UTC>>
#> 1 Greg Mascot {2018-01-01--2018-06-03}
#> 2 Greg Executive Assistant {2018-06-10--2020-04-01}
#> 3 Greg Chief of Staff {2020-03-01--2020-11-28}
#> 4 Tom Chairman {2019-05-01--2020-11-10}
#> 5 Tom CEO {2020-11-10--2020-12-31}
#> 6 Shiv Political Consultant {2017-01-01-[2]-2019-04-01}Merging Intervals
Suppose we want to analyze only the total time each character spent
employed, without distinguishing between individual jobs. This can be
done using phint_squash(), which aggregates a vector of
intervals into a minimal set of non-overlapping spans within a scalar
<phinterval>.
employment <- jobs |>
mutate(span = interval(start, end)) |>
group_by(name) |>
summarize(employed = phint_squash(span))
employment
#> # A tibble: 3 × 2
#> name employed
#> <chr> <phint<UTC>>
#> 1 Greg {2018-01-01--2018-06-03, 2018-06-10--2020-11-28}
#> 2 Shiv {2017-01-01--2019-04-01}
#> 3 Tom {2019-05-01--2020-12-31}Notice that:
-
Greg has multiple disjoint employment periods, which are
preserved as separate spans within a single
<phinterval>element. -
Tom held two back-to-back positions (Chairman followed by
CEO), which
phint_squash()correctly merges into a single contiguous span.
The by argument of phint_squash() and
datetime_squash() (which takes start and
end times directly) can be used in place of
dplyr::group_by(). The example below is equivalent to the
previous code but is usually several times faster.
datetime_squash(
start = ymd(jobs$start),
end = ymd(jobs$end),
by = jobs$name,
keep_by = TRUE,
order_by = TRUE
)
#> # A tibble: 3 × 2
#> by phint
#> <chr> <phint<UTC>>
#> 1 Greg {2018-01-01--2018-06-03, 2018-06-10--2020-11-28}
#> 2 Shiv {2017-01-01--2019-04-01}
#> 3 Tom {2019-05-01--2020-12-31}As in dplyr::summarize(), the by argument
can be a vector or data frame to support multiple grouping columns.
To return the dataset to a one-row-per-span format, use
phint_unnest(), which converts each
<phinterval> element into separate rows:
employment |>
reframe(phint_unnest(employed, key = name))
#> # A tibble: 4 × 3
#> key start end
#> <chr> <dttm> <dttm>
#> 1 Greg 2018-01-01 00:00:00 2018-06-03 00:00:00
#> 2 Greg 2018-06-10 00:00:00 2020-11-28 00:00:00
#> 3 Shiv 2017-01-01 00:00:00 2019-04-01 00:00:00
#> 4 Tom 2019-05-01 00:00:00 2020-12-31 00:00:00Finding Gaps
To analyze periods of unemployment, we need to identify the gaps
between employment intervals. The phint_invert() function
returns the gaps between spans in a <phinterval>.
unemployment <- employment |>
mutate(
# Find the gaps between jobs
unemployed = phint_invert(employed),
# Calculate duration of unemployment
days_unemployed = unemployed / ddays(1)
) |>
select(name, unemployed, days_unemployed)
unemployment
#> # A tibble: 3 × 3
#> name unemployed days_unemployed
#> <chr> <phint<UTC>> <dbl>
#> 1 Greg {2018-06-03--2018-06-10} 7
#> 2 Shiv <hole> 0
#> 3 Tom <hole> 0Greg was unemployed for 7 days between his time as a Mascot and his
role as Executive Assistant. Tom and Shiv have no gaps within their
respective employment timelines, represented by a
<hole>.
Edge Cases and Gotchas
Abutting Intervals and Intersection
Manipulating abutting intervals (intervals that share an endpoint) can produce sometimes unexpected results. To demonstrate, consider the time within a Monday and Tuesday in November 2025.
monday <- interval(ymd("2025-11-10"), ymd("2025-11-11"))
tuesday <- interval(ymd("2025-11-11"), ymd("2025-11-12"))By default, intervals in <phinterval> and
<Interval> vectors have inclusive endpoints, meaning
that midnight on Monday, November 11th, 2025 falls within both
monday and tuesday:
midnight_monday <- ymd_hms("2025-11-11 00:00:00")
phint_within(midnight_monday, monday)
#> [1] TRUE
phint_within(midnight_monday, tuesday)
#> [1] TRUEAs a result, the intersection of monday and
tuesday is an instantaneous interval at
midnight_monday.
phint_intersect(monday, tuesday) == as_phinterval(midnight_monday)
#> [1] TRUEPerhaps surprisingly, this also means that the intersection of
monday and its complement is not empty, but consists of the
two endpoints of monday.
not_monday <- phint_complement(monday)
not_monday
#> <phinterval<UTC>[1]>
#> [1] {-Inf--2025-11-10, 2025-11-11--Inf}
phint_intersect(monday, not_monday)
#> <phinterval<UTC>[1]>
#> [1] {2025-11-10--2025-11-10, 2025-11-11--2025-11-11}The bounds argument in phint_overlaps(),
phint_within(), and phint_intersect() controls
this behavior. When bounds = "()", endpoints are treated as
exclusive:
phint_overlaps(monday, tuesday, bounds = "()")
#> [1] FALSE
phint_intersect(monday, tuesday, bounds = "()")
#> <phinterval<UTC>[1]>
#> [1] <hole>With exclusive endpoints, monday and
tuesday no longer overlap, and their intersection is
empty.
An instantaneous interval (point, point) with open
bounds is mathematically undefined, but for convenience we allow these
points to exist. With bounds = "()", instants on the
endpoint of an interval are outside of the interval, while instants in
the middle of an interval are considered to be within it:
monday_at_9AM <- as_phinterval(ymd_hms("2025-11-10 00:09:00"))
phint_within(monday_at_9AM, monday, bounds = "()")
#> [1] TRUE
phint_within(midnight_monday, monday, bounds = "()")
#> [1] FALSETo consider instantaneous intervals as empty, use
phint_sift() to remove all instants from an interval
vector:
phint <- phint_squash(c(monday_at_9AM, tuesday))
phint
#> <phinterval<UTC>[1]>
#> [1] {2025-11-10 00:09:00--2025-11-10 00:09:00, 2025-11-11 00:00:00--2025-11-12 00:00:00}
phint_sift(phint)
#> <phinterval<UTC>[1]>
#> [1] {2025-11-11--2025-11-12}Instantaneous Intervals and Set Difference
Because phinterval elements are composed of non-overlapping,
non-adjacent spans, “punching” an instantaneous hole into an interval
using phint_setdiff() has no effect on the interval. While
removing a single point from an interval [start, end] would
theoretically split it into [start, point) and
(point, end], in practice these adjacent pieces are
immediately merged back together:
monday_noon <- as_phinterval(ymd_hms("2025-11-10 12:00:00"))
monday_lunch_break <- interval(
ymd_hms("2025-11-10 12:00:00"),
ymd_hms("2025-11-10 13:00:00")
)
phint_setdiff(monday, monday_lunch_break) # Removes a non-zero interval
#> <phinterval<UTC>[1]>
#> [1] {2025-11-10 00:00:00--2025-11-10 12:00:00, 2025-11-10 13:00:00--2025-11-11 00:00:00}
phint_setdiff(monday, monday_noon) # Instantaneous - no effect
#> <phinterval<UTC>[1]>
#> [1] {2025-11-10--2025-11-11}To create gaps, you must remove an interval with non-zero duration.
Time Zones
To ensure that any <Interval> vector can be
represented as an equivalent <phinterval> vector, the
phinterval() constructor accepts any time zone permitted by
interval(), including unrecognized zones.
intvl <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "nozone")
phint <- phinterval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "nozone")
intvl == phint
#> [1] TRUEWhen a <phinterval> with an unrecognized time zone
is formatted, its time points are displayed using the UTC time zone:
print(phint)
#> <phinterval<nozone>[1]>
#> [1] {2020-01-01--2020-01-02}The is_recognized_tzone() function can be used to check
whether a time zone is recognized:
is_recognized_tzone("America/New_York")
#> [1] TRUE
is_recognized_tzone("nozone")
#> [1] FALSE
is_recognized_tzone(NA_character_)
#> [1] FALSESome datetime vectors, such as <POSIXct>, are
allowed to have an NA time zone. When converted to a
<phinterval>, the missing time zone is silently
replaced with UTC:
na_zoned <- as.POSIXct("2021-01-01", tz = NA_character_)
as_phinterval(na_zoned)
#> <phinterval<UTC>[1]>
#> [1] {2021-01-01--2021-01-01}Operations that combine two or more interval vectors, such as
phint_union(), use the time zone of the first argument. If
the first argument’s time zone is "" (the user’s local time
zone), the second argument’s time zone is used instead.
int_est <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "EST")
int_utc <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "UTC")
int_lcl <- interval(ymd("2020-01-01"), ymd("2020-01-02"), tzone = "")
phint_union(int_est, int_utc)
#> <phinterval<EST>[1]>
#> [1] {2019-12-31 19:00:00--2020-01-01 19:00:00}
phint_union(int_utc, int_est)
#> <phinterval<UTC>[1]>
#> [1] {2020-01-01--2020-01-02}
phint_union(int_lcl, int_est)
#> <phinterval<EST>[1]>
#> [1] {2019-12-31 19:00:00--2020-01-01 19:00:00}Comparison with Datetime Vectors
Comparison operators (<=, <,
>, >=, ==) work in
unexpected ways when comparing datetime vectors
(<Date>, <POSIXct>,
<POSIXlt>) to <phinterval> or
<Interval> vectors. For example:
span <- phinterval(ymd("2000-08-05"), ymd("2000-11-29"))
date <- ymd("2021-01-01")
span == date
#> size starts ends
#> FALSE NA NAFor the intended behavior, use as_phinterval() to
convert datetime vectors into an equivalent
<phinterval> first.
span == as_phinterval(date)
#> [1] FALSE