7 Examples with ggplot2

library(readr)
library(ggplot2)
library(dplyr)

7.1 Brief overview of ggplots2 package

All ggplot functions must have at least three components:

  • data frame: data associated with the plot.
  • geom: to determine the type of geometric shape used to display the data, such as line, bar, point, or area. Note that you can use different data in different geoms by
  • aes: to determine how variables in the data are mapped to visual properties (aesthetics) of geoms. This can include x position, y position, color, shape, fill, and size.
  • If the aes statement is included in the ggplot command, the mapping will be available to any geoms used
  • If you need a variable mapped to just a specific geom, you can include it in an aes mapping for a specific geom. (You can also do something similar with the data option if you need to plot multiple data sources on the same display. However, it is generally better to think about structuring your dataset )

7.2 Practice Problems with ggplots2

For this set of exercises we will be using a slightly larger version of the Stat 113 first day survey data. It is in the file stat113_f18_s19.csv (and contains two semesters of surveys).

stat113 <- read_csv("data/stat113_f18_s19.csv")

7.3 Univariate Analyses

We will begin with a series of basic displays for univariate analyses.

  • Investigate the distribution of the amount of TV watched (in hours per week) by Stat 113 students.
base_plot <-
  stat113 %>%
    ggplot(data = .,
           mapping = aes(x = TV)
           ) 

base_plot + 
  geom_histogram(
    # bins = 20 # control the number of bins
    binwidth = 1, # links to the variable's unit --> in this case hrs/wk,
    fill = "thistle3",
    color = "peachpuff4"
    ) +
  labs(x = "Number of hours of TV watched per week") +
  theme_bw()
  • Use another display to investigate the distribution of the amount of TV watched (in hours per week) by Stat 113 students.
base_plot +
  geom_density(color = "blue", 
               fill = "red",
               alpha = 0.5
               )
  • Combine the geoms from the previous part onto the same plot.
# not working, b/c hist and density aren't
# on the same scale
base_plot +
  geom_histogram() +
  geom_density(color = "blue")
# force the histogram to be on the density scale
base_plot +
  geom_histogram(
    aes(y =  after_stat(density)) 
    ) +
  geom_density()
# force the density to be on the count scale
base_plot +
  geom_histogram(bins = 50) +
  geom_density(aes( y = after_stat(count) ))
  • Pick your favorite chart of the three and play around with a few options such as color, fill, linetype, and alpha.

  • Pick another numeric variable to explore. Play around with themes, coordinate systems, and labels.

ggplot(data = stat113,
       aes(x = Travel)
       ) +
  geom_density(fill= "dodgerblue", 
               alpha = 0.25, 
               color = "black",
               linetype = 2
               ) +
  coord_flip() +
#  coord_polar() + # useful for circular data (e.g., wind direction or time)
  labs(caption = "Figure 1: Distribution of ...",
       x = "Travel time (in hours) to SLU"
       )
  • Pick a categorical variable and make a bar chart.
award_plot <-
  stat113 %>%
    filter(!is.na(Award)) %>%
    ggplot(aes(x = Award, fill = Award)) +
    geom_bar()
award_plot
ggsave(filename = "stat113_award_plot.jpg",
  plot = award_plot,
  width = 9, 
  height = 3 )

7.4 Multivariate Analyses

We will investigate a few research questions involving two variables and produce graphics of the following. (Note: You may need to use different sets of variables for some of them.)

  • Side-by-side boxplots and/or violin plots
stat113 %>%
  filter(!is.na(Award)) %>%
  ggplot(aes(x = Award, 
             y = GPA, 
             fill = Award)) +
  geom_violin() +
  scale_fill_manual(
    values = c("dodgerblue","tomato","peachpuff"),
    name = "Type of\nAward Chosen" # \n is used to insert a line break
  ) +
  labs(title = "Title of plot",
       caption = "Caption of plot"
       ) +
  coord_cartesian(ylim = c(0,NA), expand = FALSE)
  • Scatterplot

  • Scatterplot + Smoother

  • Scatterplot + Linear Smoother

stat113 %>%
  filter(!is.na(Tattoo)) %>%
  ggplot(aes(x = TV, y = GPA, shape = Tattoo,
             color = Tattoo, fill = Tattoo) ) +
  geom_point() +
  geom_smooth() +
  geom_smooth(method = lm) +
  facet_wrap(vars(Tattoo), scales = "free")
  • Stacked Bar Chart
stat113 %>%
  filter(!is.na(Award), !is.na(Sport)) %>%
  ggplot(aes(x = Award, fill = Sport)) +
  geom_bar(position = "stack")
# cluster bar chart
stat113 %>%
  filter(!is.na(Award), !is.na(Sport)) %>%
  ggplot(aes(x = Award, fill = Sport)) +
  geom_bar(position = "dodge")
# filled bar chart
stat113 %>%
  filter(!is.na(Award), !is.na(Sport)) %>%
  ggplot(aes(fill = Award, x = Sport)) +
  geom_bar(position = "fill") +
  labs(y = "Proportion") +
  scale_fill_viridis_d()
#  scale_fill_brewer(palette = "Dark2") +
  • Faceted Density plots

After getting a “basic plot” constructed for each, investigate options to customize and “clean up” the plots. Try to make them look nice.

  • Recreate “example.png” (from the T drive)

  • Use Class as a factor (categorical variable) instead of a numerical variable

7.5 Further Practice

The document 06-data_visualization contains more detailed notes related to the grammar of graphics (using ggplots2). Please note that a set of Exercises will have you working through those from that document. You might as well get started on them early.