15 Dates with lubridate
Goals:
- use
lubridatefunctions to convert a character variable to a<date>variable. - use
lubridatefunctions to extract useful information from a<date>variable, including the year, month, day of the week, and day of the year.
15.1 Converting Variables to <date>
The lubridate package is built to easily work with Date objects and DateTime objects.
To begin, here are a few basic functions today(), which prints today’s date, and now() prints today’s date and time.
library(tidyverse)
library(lubridate)
today()
now()This first section will deal with how to convert a variable in R to be a Date. We will use a data set that has the holidays of from the game series, Animal Crossing, from January to April. This data is located in the file animal_crossing_holidays.csv and the columns in this data set are:
Holiday, the name of the holiday and- various other columns with different date formats
Read in the data set with
holiday_df <- read_csv("data/animal_crossing_holidays.csv")
holiday_dfWhich columns are specified as Dates?
Which columns seem like they could be specified as Dates?
15.1.1 From <chr> to <date>
Although we’ve already seen the parse_date function from readr in Project 1, we will learn of a few lubridate functions that can be utilized instead.
We will use the dmy() series of functions in lubridate to convert character variables to dates. We will typically pair this new function with a mutate() statement: much like the forcats functions, we are almost always creating a new variable.
There are a series of dmy()-type variables, each corresponding to a different Day-Month-Year order.
dmy()is used to parse a date from a character vector that has the day first, month second, and year last.ymd()is used to parse a date that has year first, month second, and date lastydm()is used to parse a date that has year first, day second, and month last,….
and dym(), mdy(), and myd() work similarly. lubridate is usually “smart” and picks up dates in all kinds of different formats (e.g. it can pick up specifying October as the month and Oct as the month and 10 as the month).
Let’s try it out on Date1 and Date2:
holiday_df %>% mutate(Date_test = dmy(Date1)) %>%
relocate(Date_test) # quick way to move new column to the front
holiday_df %>% mutate(Date_test = mdy(Date2)) %>%
relocate(Date_test)15.1.2 Making a <date> variable from Date Components
Another way to create a Date object is to assemble it with make_date() from a month, day, and year components, each stored in a separate column:
holiday_df %>% mutate(Date_test2 = make_date(year = Year,
month = Month,
day = Day)) %>%
relocate(Date_test2)But, when Month is stored as a character (e.g. February) instead of a number (e.g. 2), problems arise with the make_date() function:
holiday_df %>% mutate(Date_test2 = make_date(year = Year,
month = Month2,
day = Day)) %>%
relocate(Date_test2)So the make_date() function requires a specific format for the year, month, and day columns. It may take a little pre-processing to put a particular data set in that format.
15.1.3 Exercises
What’s the issue with trying to convert
Date4to a<date>form? You may want to investigateDate4further to answer this question.Practice converting
Date3andDate5to<date>variables withlubridatefunctions.
# Date3
holiday_df %>%
mutate(Date_Test3 = mdy(Date3)) %>%
relocate(Date_Test3)
# Date5
holiday_df %>%
mutate(Date_Test5 = ymd(Date5)) %>%
relocate(Date_Test5)15.2 Functions for <date> Variables
Once an object is in the <date> format, there are some special functions in lubridate that can be used on that date variable.
To investigate some of these functions, we will pull stock market data from Yahoo using the quantmod package. Run the following code, which gets stock market price data on Apple, Nintendo, Chipotle, and the S & P 500 Index from 2011 to now. Note that you have the ability to understand all of the code below, but we will skip over this code for now to focus more on the new information in this section (information about date functions).
library(quantmod)
start <- ymd("2011-01-01")
end <- ymd("2021-5-19")
getSymbols(c("AAPL", "NTDOY", "CMG", "SPY"), src = "yahoo",
from = start, to = end)
date_tib <- as_tibble(index(AAPL)) %>%
rename(start_date = value)
app_tib <- as_tibble(AAPL)
nint_tib <- as_tibble(NTDOY)
chip_tib <- as_tibble(CMG)
spy_tib <- as_tibble(SPY)
all_stocks <- bind_cols(date_tib, app_tib, nint_tib, chip_tib, spy_tib)
stocks_long <- all_stocks %>%
select(start_date, AAPL.Adjusted, NTDOY.Adjusted,
CMG.Adjusted, SPY.Adjusted) %>%
pivot_longer(2:5, names_to = "Stock_Type", values_to = "Price") %>%
mutate(Stock_Type = fct_recode(Stock_Type,
Apple = "AAPL.Adjusted",
Nintendo = "NTDOY.Adjusted",
Chipotle = "CMG.Adjusted",
`S & P 500` = "SPY.Adjusted"
))
tail(stocks_long)I’ve made a data set with three variables:
start_date, the opening date for the stock marketStock_Type, a factor with 4 levels:Apple,Nintendo,Chipotle, andS & P 500Price, the price of the stock?
First, let’s make a line plot that shows how the S & P 500 has changed over time:
stocks_sp <- stocks_long %>% filter(Stock_Type == "S & P 500")
ggplot(data = stocks_sp, aes(x = start_date, y = Price)) +
geom_line()But, there’s other information that we can get from the start_date variable. We might be interested in things like day of the week, monthly trends, or yearly trends. To extract variables like “weekday” and “month” from a <date> variable, there are a series of functions that are fairly straightforward to use. We will discuss the year() month(), mday(), yday(), and wday() functions.
15.2.1 year(), month(), and mday()
# simple example
today() %>% year()
today() %>% month(label = TRUE, # use label=TRUE to get text version
abbr = FALSE) # use abbr=FALSE to get full text version
today() %>% mday()The functions year(), month(), and mday() can grab the year, month, and day of the month, respectively, from a <date> variable. Like the forcats functions, these will almost always be paired with a mutate() statement because they will create a new variable:
stocks_long %>% mutate(year_stock = year(start_date))
stocks_long %>% mutate(month_stock = month(start_date))
stocks_long %>% mutate(day_stock = mday(start_date))15.2.2 yday() and wday()
# simple example
today()+1 # GIVES TOMORROW
today() %>% yday()
today() %>% wday() # label and abbr options are available for wdayThe yday() function grabs the day of the year from a <date> object. For example,
test <- mdy("November 4, 2021")
yday(test)returns 309, indicating that November 4th is the 308th day of the year 2021. Using this function in a mutate() statement creates a new variable that has yday for each observation:
stocks_long %>% mutate(day_in_year = yday(start_date))Finally, the function wday() grabs the day of the week from a <date>. By default, wday() puts the day of the week as a numeric, but this can be confusing. e.g., Does 1 means Sunday or Monday? Adding, label = TRUE creates the weekday variable as Sunday, Monday, Tuesday, etc.:
stocks_long %>% mutate(day_of_week = wday(start_date))
stocks_long %>% mutate(day_of_week = wday(start_date,
label = TRUE, abbr = FALSE))Possible uses for these functions are:
you want to look at differences between years (with
year())you want to look at differences between months (with
month())you want to look at differences between days of the week (with
wday())you want to see whether there are yearly trends within years (with
yday())
Note: Working with times is extremely similar to working with dates. Instead of ymd(), mdy(), etc., you tack on a few extra letters to specify the order that the hour, minute, and seconds appear in the variable: ymd_hms() converts a character vector that has the order year, month, day, hour, minute, second to a <datetime>.
Additionally, the functions hour(), minute(), and second() grab the hour, minute, and second from a <datetime> variable.
15.2.3 Exercises
- The
month()function gives the numbers corresponding to each month by default. Type?monthand figure out which argument you would need to change to get the names (January, February, etc.) instead of the month numbers. What about the abbreviations (Jan, Feb, etc.) of each month instead of the month numbers? Try making the changes in themutate()statement below.
15.3 Additional Exercises
The truncated argument to ymd(), dmy(), mdy(), etc. will allow R to parse dates that aren’t actually complete. For example,
ymd("2019", truncated = 2)parses 2019 to be January 1, 2019 when the month and day are missing. The 2 means that the last two parts of the date (in this case, month and day) are allowed to be missing. Similarly,
dmy("19-10", truncated = 1)truncates the year (which is given as 0000). The truncate function is usually most useful in the context of the first example with a truncated month and/or day.
Examine the ds_google.csv, which contains
Month, the year and month from 2004 until recentlyData_Science, the relative popularity of data science (Google keeps how it calculates “popularity” as somewhat of a mystery but it is likely based off of the number of times people search for the term “Data Science”)
library(tidyverse)
library(lubridate)
ds_df <- read_csv("data/ds_google.csv")
ds_dfUse a
lubridatefunction with thetruncatedoption to convert theMonthvariable to be in the<date>format.Make a plot of the popularity of Data Science through Time. Add a smoother to your plot. What patterns do you notice?
The data was obtained from Google Trends: Google Trends. Google Trends is incredibly cool to explore, even without R.
- First, lets check out Google Trends. On it, enter in a search term, and change the Time dropdown menu to be the range you want. Then, enter in a second search term that you want to compare. You can also change the country if you want to (or, you can keep the country as United States).
My search terms will be “pumpkin spice” and “cold brew” from that past 10 years, but yours should be something that interests you!
Notice that in the top-right window of the graph, you can click the down arrow to download the data set. However, there is a nice R package called gtrendsR that will allow you to query Google Trends straight from R!
Let’s try a few things with my search terms. (You’ll have a chance to try some search terms of your own later.)
library(gtrendsR)search_terms <- c("pumpkin spice", "cold brew")
mysearch <- gtrends(
keyword = search_terms,
onlyInterest = TRUE,
time = "2012-04-18 2022-04-18"
)
coffee_df <- mysearch[[1]]
coffee_df- Make a plot of the popularity variables through time.
library(ggplot2)
ggplot(data = coffee_df, aes(x = date, y = hits, color = keyword)) +
geom_line(size = 1.5) +
theme_bw()- Make a table of the average popularity for each year. Structure it such that you have a column for the year, and one for each search term.
coffee_df_v2 <-
coffee_df %>%
mutate(search_year = year(date))
coffee_df_v2 %>%
group_by(search_year, keyword) %>%
summarize(avg_pop = mean(hits))Using the gtrends data set, make a clustered bar chart of the average popularity by month.
Reconsider question 5. Instead of using the average popularity for each year, what might be an alternative (and possibly better) way to measure the popularity of the search term in a year? Make a table (similar to that from question 5) with your alternative measurement of yearly
Now enter a search term that you’d like to investigate for the past 90 days. Feel free to pick something that interests you.
Make a plot of your popularity variable through time, adding a smoother.
Using your data set that explored a variable from the past 90 days, construct a table that compares the average popularity on each day of the week (Monday through Saturday).