9 Practice with factors
February 12, 2009 marked the 200th anniversary of Charles Darwin’s birth. Gallup, a national polling organization, surveyed 1018 Americans about their education level and their beliefs about evolution. The results from this survey are in the file darwin.csv
.
- Load and necessary packages and read in the data.
library(ggplot2)
library(dplyr)
library(readr)
library(forcats)
<- read_csv("data/darwin.csv") darwin
- Investigate different responses for each variable. One quick way of doing this is to use something like
unique(data_name$variable_name)
.
unique(darwin$Education)
unique(darwin$Belief)
dput(unique(darwin$Education))
Hopefully, you noticed that the variables are ordinal categorical variables. However, the default way R handles factors is to put them in alphabetical order.
- Properly order the education level and belief variables. This function works on the variable level (and hence can be done within a
mutate
statement). Try to do this by writing a single chain of piped commands starting with the initial data frame and releveling both factors within the same mutate.
<-
darwin %>%
darwin mutate(Education_fct = factor(Education),
Belief_fct = factor(Belief)
%>%
) mutate(
Education_fct = fct_relevel(Education_fct,
c(
"High School or Less",
"Some College",
"College Graduate",
"Postgraduate"
)
) )
- Use ggplots to make a stacked bar chart where each bar is scaled to 100% (or a proportion of 1 is fine too) to visually investigate the relationship between the two variables. Also, check out the coord_flip() option and clean up the labels as needed.
%>%
darwin ggplot(aes(x = Education_fct, fill = Belief_fct)) +
geom_bar(position = "fill") +
coord_flip() +
labs(fill = "Belief", x = "Education Level", y = "Proportion")
- Use the group_by statement (and any other necessary commands) to get the counts for each Education/Belief combination.
<-
darwin_short %>%
darwin group_by(Belief_fct, Education_fct) %>%
summarise(
counts = n()
%>%
) ungroup()
- Use the table from part (e) (not the full dataset) to create a grouped (or clustered) bar chart to investigate the relationship between the Education and Belief.
%>%
darwin_short ggplot(aes(x = Education_fct %>% fct_rev(),
fill = Belief_fct,
y = counts
+
)) geom_col(position = "fill") +
coord_flip()
Refer back to the Stat 113 first day survey.
<- read_csv("data/Stat113Fall2021.csv") stat113
Plot the GPA by Class such that the sections are ordered by the median GPA.
%>%
stat113 mutate(Class = factor(Class)) %>%
filter(!is.na(GPA)) %>%
ggplot(aes(x = fct_reorder(Class, GPA) ,
y = GPA)) +
geom_boxplot()