7 Practice Problems with ggplots2

For this set of exercises we will be using a slightly larger version of the Stat 113 first day survey data. It is in the file stat113_f18_s19.csv (and contains two semesters of surveys).

stat113 <- read_csv("data/stat113_f18_s19.csv")

7.1 Univariate Analyses

We will begin with a series of basic displays for univariate analyses.

  • Investigate the distribution of the amount of TV watched (in hours per week) by Stat 113 students.
stat113_tv <-
  ggplot(data = stat113,
         mapping = aes(x = TV) 
         ) +
    geom_histogram(
      bins = 12, # chooses number of "bin locations"
      color = "firebrick2", # outline of bins
      fill = "cornflowerblue" # inside of bins
      # binwidth = 5 # make bins 5 hours wide
      )
stat113_tv

# saving a ggplot
# method 1: Use the Export button
# method 2: use the ggsave() 
ggsave(filename = "stat113_histogram.png", 
       plot = stat113_tv)
  • Use another display to investigate the distribution of the amount of TV watched (in hours per week) by Stat 113 students.
# try a density plot
ggplot(data = stat113,
       mapping = aes(x = TV)) +
  geom_density(color = "darkred",
               fill = "black",
               alpha = 0.3,
               linetype = 2,
               size = 1.2 
               )
  • Combine the geoms from the previous part onto the same plot.
ggplot(data = stat113,
       mapping = aes(x = TV)
       ) +
  geom_histogram(aes(y = (..density..) )) +
  geom_density(color = "blue")
  • Pick your favorite chart of the three and play around with a few options such as color, fill, linetype, and alpha.

  • Pick another numeric variable to explore. Play around with themes, coordinate systems, and labels.

stat113_tv + 
  theme_bw() +
  labs(x = "Hours of TV per week", 
       y = "") # +
#  coord_flip() # flip at the very end
#  coord_polar() # useful for "time of day" data
  • Pick a categorical variable and make a bar chart.
stat113 %>%
  filter(!is.na(Award)) %>%
  ggplot(data = .,
         mapping = aes(x = Award)
         ) +
    geom_bar()

7.2 Multivariate Analyses

We will investigate a few research questions involving two variables and produce graphics of the following. (Note: You may need to use different sets of variables for some of them.)

  • Side-by-side boxplots and/or violin plots
# x & y are required
# need group aes if your x variable is stored as a number

# clean data first
stat113_noNa_award <- stat113 %>% 
  filter(!is.na(Award))

# plot cleaned data
ggplot(data = stat113_noNa_award,
       mapping = aes(x = Award, 
                     y = Pulse,
                     fill = Award # fills groups with different colors
                     )
       ) +
  geom_boxplot() +
#  geom_violin() +
  theme_bw() +
  labs(y = "Pulse Rate\n(beats per minute)",
       title = "Stat 113 Pulse Rate\nby Award Choice",
       caption = "Place a nice informative caption here"
       ) +
  theme(legend.position = "right") + 
  # scale_fill_viridis_d()
   scale_fill_brewer(palette = "Dark2", 
                     name = "Award\nChosen") +
#  scale_fill_manual(
#    values = c("blueviolet",
#  "lightgoldenrod4",
#               "tomato4"),
#    name = "Award\nChosen"
#  )
  coord_flip()
  • Scatterplot

  • Scatterplot + Smoother

  • Scatterplot + Linear Smoother

ggplot(data = stat113,
       mapping = aes(x = TV, y = GPA,
                     color = Tattoo)) +
  geom_point() +
#  geom_smooth(se=FALSE, 
#              color = "red")+
  geom_smooth(method = "lm", se=FALSE)
  • Stacked Bar Chart
ggplot(data = stat113_noNa_award,
       mapping = aes(x = Award, 
                     fill = Sport)
       ) +
  geom_bar() +
  coord_flip()
  • Facetted Density plots
ggplot(data = stat113,
       mapping = aes(x = Haircut, fill = Sex
       )) +
  geom_density() +
  theme_bw()+
  facet_wrap(vars(Sex))

After getting a “basic plot” constructed for each, investigate options to customize and “clean up” the plots. Try to make them look nice.

  • Recreate “example.png” (from the T drive)

  • Use Class as a factor (categorical variable) instead of a numerical variable

7.3 Further Practice

The document 06-data_visualization contains more detailed notes related to the grammar of graphics (using ggplots2). Please note that the next set of Exercises will have you working through those from that document. You might as well get started on them early.