16 Introduction to Working with Dates in R
16.1 Introduction
In many data files, the date or time of day will be an important variable. In this introductory tutorial, we will learn some basics on to handles dates.
A Reminder: Why do <date> objects even matter? Compare the following two plots: one made where the date is in <chr> form and the other where date is in its appropriate <date> form.
library(tidyverse)
library(lubridate)
animal_crossing <- read_csv("data/animal_crossing_holidays.csv")
animal_crossing %>%
ggplot(data = ., aes(x = Date1, y = Holiday)) +
geom_point()
animal_crossing %>%
mutate(Date_test_plot = dmy(Date1)) %>%
ggplot(data = ., aes(x = Date_test_plot, y = Holiday)) +
geom_point()In which plot does the ordering on the x-axis make more sense?
16.2 Dates with lubridate
Goals:
- use
lubridatefunctions to convert a character variable to a<date>variable. - use
lubridatefunctions to extract useful information from a<date>variable, including the year, month, day of the week, and day of the year.
16.2.1 Converting Variables to <date>
The lubridate package is built to easily work with Date objects and DateTime objects.
To begin, here are a few basic functions today(), which prints today’s date, and now() prints today’s date and time.
today()
now()There are a number of built-in functions to convert character strings to Dates and Times.
16.2.1.1 Parsing Dates and Times
ymd(): Parses dates in the format of “year-month-day” and returns a datetime object.dmy(): Parses dates in the format of “day-month-year” and returns a datetime object.mdy(): Parses dates in the format of “month-day-year” and returns a datetime object.hm(): Parses times in the format of “hour-minute” and returns a time object.hms(): Parses times in the format of “hour-minute-second” and returns a datetime object.
Here is a quick example showing what they do.
# dates in different formats
d1 <- "2023-04-19"
d2 <- "19-04-2023"
d3 <- "04-19-2023"
ymd(d1)
dmy(d2)
mdy(d3)
mdy(d2) # fails to parse b/c no month 19# Parse time in different formats
time_hm <- hm("10:15")
time_hms <- hms("10:15:30")As seen before, these also work on variables within data frames (tibbles).
animal_crossing %>%
mutate(Date1_v2 = dmy(Date1)) %>%
relocate(Date1_v2)16.2.1.2 year(), month(), and mday()
The functions year(), month(), and mday() can grab the year, month, and day of the month, respectively, from a <date> variable. Like the forcats functions, these will almost always be paired with a mutate() statement because they will create a new variable.
Notice in the animal crossings data there are a number of variables related to these aspects. Here is how they were created.
# starting fresh
animal_crossing2 <- animal_crossing %>% select(Holiday, Date1)
# recreate initial
animal_crossing2 %>%
mutate(
Date = dmy(Date1),
Month = month(Date),
Year = year(Date),
Day = mday(Date),
Month2 = month(Date, label = TRUE, abbr = FALSE),
# a few extras
Day_in_year = yday(Date),
Day_of_week = wday(Date, label = TRUE, abbr = TRUE),
week_of_year = week(Date)
)16.3 Using parse_date from the readr package
Another common way to work with dates is to use the parse_date (and parse_date_time function from lubridate). This usually requires us to identify a format for the date (or date-time) structure.
For example, if you have a date in the format “2023-04-18 09:19:59”, you can use the following format string to parse it using the parse_date() function:
date_string <- "2023-04-18 09:19:59"
date <- parse_date(date_string,
format = "%Y-%m-%d %H:%M:%S"
)
dateBelow is a table of common formats.
| Format | Description |
|---|---|
| %d | Day of the month as a number (01-31). |
| %m | Month as a number (01-12). |
| %Y | Year with century (as a four digit number). |
| %y | Year without century (00-99). |
| %H | Hour (24-hour clock) as a decimal number (00-23). |
| %I | Hour (12-hour clock) as a decimal number (01-12). |
| %M | Minute as a decimal number (00-59). |
| %S | Second as a decimal number (00-59). |
| %z | Time zone offset from UTC (e.g., “-0800”). |
| %Z | Time zone name. |
16.4 Another Example
We have data on flights originating from New York airports in November 2022.
url <- "https://raw.githubusercontent.com/iramler/stat234/main/notes/data/ny_airports_nov2022.csv"
ny_airports <- read_csv(url)ny_airports %>% mutate(FL_DATE = str_remove(FL_DATE, " 12:00:00 AM")) -> ny_airportsFirst, lets reduce this data into just the four airports in Albany, Buffalo, Rochester, and Syracuse.
airports_to_use = paste( c('Albany', 'Buffalo', 'Rochester', 'Syracuse'), "NY", sep = ", ")
airports_to_use
upstate_airports <-
ny_airports %>%
filter(ORIGIN_CITY_NAME %in% airports_to_use)Now convert the FL_DATE variable from a <chr> to a <date>.
upstate_airports <-
upstate_airports %>%
mutate(Flight_Date = mdy(FL_DATE))Calculate the average delay time for each airport.
Calculate the proportion of flights delayed for each airport.
upstate_airports %>%
group_by(ORIGIN_CITY_NAME) %>%
summarise(
avgDelay = mean(DEP_DELAY, na.rm=TRUE),
propDelay = mean( (DEP_DELAY > 0) , na.rm= TRUE )
)- Which day of the week has the most flights?
upstate_airports %>%
mutate(day.of.week = wday(Flight_Date,
label = TRUE,
abbr = TRUE
) ) %>%
group_by(day.of.week) %>%
summarise(
n_flights = n()
) %>%
slice_max(n_flights, n = 1)Try out this plot.
library(ggTimeSeries)
upstate_airports %>%
group_by(Flight_Date, ORIGIN_CITY_NAME) %>%
summarise(avg_daily_delay = mean(DEP_DELAY, na.rm=TRUE)) %>%
ungroup() %>%
ggplot_calendar_heatmap("Flight_Date",
"avg_daily_delay",
) +
facet_wrap(~ORIGIN_CITY_NAME) +
theme(legend.position = "top") +
scale_fill_continuous(low = 'green', high = 'red') +
labs(y = "Day of Week", y = "Month", fill = "Average Delay (min)") +
coord_flip()16.5 Another Fun Example
https://trends.google.com/trends/explore?geo=US&hl=en
library(gtrendsR)
search_terms <- c("pumpkin spice","cold brew")
mysearch <- gtrends(
keyword = search_terms, onlyInterest = TRUE,
#time = "all", # this will let us get the data from 2004 - present
time = "2013-04-18 2023-04-18"
)
coffee_df <- mysearch[[1]]
head(coffee_df)
tail(coffee_df)- Make a plot of your Popularity variables through time.