7 Examples with ggplot2
library(readr)
library(ggplot2)
library(dplyr)
7.1 Brief overview of ggplots2
package
All ggplot
functions must have at least three components:
- data frame: data associated with the plot.
- geom: to determine the type of geometric shape used to display the data, such as line, bar, point, or area. Note that you can use different data in different geoms by
- aes: to determine how variables in the data are mapped to visual properties (aesthetics) of geoms. This can include x position, y position, color, shape, fill, and size.
- If the
aes
statement is included in theggplot
command, the mapping will be available to anygeom
s used - If you need a variable mapped to just a specific
geom
, you can include it in anaes
mapping for a specific geom. (You can also do something similar with thedata
option if you need to plot multiple data sources on the same display. However, it is generally better to think about structuring your dataset )
7.2 Practice Problems with ggplots2
For this set of exercises we will be using a slightly larger version of the Stat 113 first day survey data. It is in the file stat113_f18_s19.csv
(and contains two semesters of surveys).
<- read_csv("data/stat113_f18_s19.csv") stat113
7.3 Univariate Analyses
We will begin with a series of basic displays for univariate analyses.
- Investigate the distribution of the amount of TV watched (in hours per week) by Stat 113 students.
<-
base_plot %>%
stat113 ggplot(data = .,
mapping = aes(x = TV)
)
+
base_plot geom_histogram(
# bins = 20 # control the number of bins
binwidth = 1, # links to the variable's unit --> in this case hrs/wk,
fill = "thistle3",
color = "peachpuff4"
+
) labs(x = "Number of hours of TV watched per week") +
theme_bw()
- Use another display to investigate the distribution of the amount of TV watched (in hours per week) by Stat 113 students.
+
base_plot geom_density(color = "blue",
fill = "red",
alpha = 0.5
)
- Combine the geoms from the previous part onto the same plot.
# not working, b/c hist and density aren't
# on the same scale
+
base_plot geom_histogram() +
geom_density(color = "blue")
# force the histogram to be on the density scale
+
base_plot geom_histogram(
aes(y = after_stat(density))
+
) geom_density()
# force the density to be on the count scale
+
base_plot geom_histogram(bins = 50) +
geom_density(aes( y = after_stat(count) ))
Pick your favorite chart of the three and play around with a few options such as color, fill, linetype, and alpha.
Pick another numeric variable to explore. Play around with themes, coordinate systems, and labels.
ggplot(data = stat113,
aes(x = Travel)
+
) geom_density(fill= "dodgerblue",
alpha = 0.25,
color = "black",
linetype = 2
+
) coord_flip() +
# coord_polar() + # useful for circular data (e.g., wind direction or time)
labs(caption = "Figure 1: Distribution of ...",
x = "Travel time (in hours) to SLU"
)
- Pick a categorical variable and make a bar chart.
<-
award_plot %>%
stat113 filter(!is.na(Award)) %>%
ggplot(aes(x = Award, fill = Award)) +
geom_bar()
award_plot
ggsave(filename = "stat113_award_plot.jpg",
plot = award_plot,
width = 9,
height = 3 )
7.4 Multivariate Analyses
We will investigate a few research questions involving two variables and produce graphics of the following. (Note: You may need to use different sets of variables for some of them.)
- Side-by-side boxplots and/or violin plots
%>%
stat113 filter(!is.na(Award)) %>%
ggplot(aes(x = Award,
y = GPA,
fill = Award)) +
geom_violin() +
scale_fill_manual(
values = c("dodgerblue","tomato","peachpuff"),
name = "Type of\nAward Chosen" # \n is used to insert a line break
+
) labs(title = "Title of plot",
caption = "Caption of plot"
+
) coord_cartesian(ylim = c(0,NA), expand = FALSE)
Scatterplot
Scatterplot + Smoother
Scatterplot + Linear Smoother
%>%
stat113 filter(!is.na(Tattoo)) %>%
ggplot(aes(x = TV, y = GPA, shape = Tattoo,
color = Tattoo, fill = Tattoo) ) +
geom_point() +
geom_smooth() +
geom_smooth(method = lm) +
facet_wrap(vars(Tattoo), scales = "free")
- Stacked Bar Chart
%>%
stat113 filter(!is.na(Award), !is.na(Sport)) %>%
ggplot(aes(x = Award, fill = Sport)) +
geom_bar(position = "stack")
# cluster bar chart
%>%
stat113 filter(!is.na(Award), !is.na(Sport)) %>%
ggplot(aes(x = Award, fill = Sport)) +
geom_bar(position = "dodge")
# filled bar chart
%>%
stat113 filter(!is.na(Award), !is.na(Sport)) %>%
ggplot(aes(fill = Award, x = Sport)) +
geom_bar(position = "fill") +
labs(y = "Proportion") +
scale_fill_viridis_d()
# scale_fill_brewer(palette = "Dark2") +
- Faceted Density plots
After getting a “basic plot” constructed for each, investigate options to customize and “clean up” the plots. Try to make them look nice.
Recreate “example.png” (from the T drive)
Use Class as a factor (categorical variable) instead of a numerical variable