9 Practice with factors

February 12, 2009 marked the 200th anniversary of Charles Darwin’s birth. Gallup, a national polling organization, surveyed 1018 Americans about their education level and their beliefs about evolution. The results from this survey are in the file darwin.csv.

  1. Load and necessary packages and read in the data.
library(ggplot2)
library(dplyr)
library(readr)
library(forcats)

darwin <- read_csv("data/darwin.csv")
  1. Investigate different responses for each variable. One quick way of doing this is to use something like unique(data_name$variable_name).
unique(darwin$Education)
unique(darwin$Belief)

dput(unique(darwin$Education))

Hopefully, you noticed that the variables are ordinal categorical variables. However, the default way R handles factors is to put them in alphabetical order.

  1. Properly order the education level and belief variables. This function works on the variable level (and hence can be done within a mutate statement). Try to do this by writing a single chain of piped commands starting with the initial data frame and releveling both factors within the same mutate.
darwin <-
  darwin %>%
    mutate(Education_fct = factor(Education),
           Belief_fct = factor(Belief)
           ) %>%
    mutate(
      Education_fct = fct_relevel(Education_fct,
                                  c(
                                    "High School or Less",
                                    "Some College",
                                    "College Graduate",
                                    "Postgraduate"
                                    )
                                  )
           )
  1. Use ggplots to make a stacked bar chart where each bar is scaled to 100% (or a proportion of 1 is fine too) to visually investigate the relationship between the two variables. Also, check out the coord_flip() option and clean up the labels as needed.
darwin %>%
  ggplot(aes(x = Education_fct, fill = Belief_fct)) +
  geom_bar(position = "fill") +
  coord_flip() +
  labs(fill = "Belief", x = "Education Level", y = "Proportion")
  1. Use the group_by statement (and any other necessary commands) to get the counts for each Education/Belief combination.
darwin_short <-
  darwin %>%
    group_by(Belief_fct, Education_fct) %>%
    summarise(
      counts = n()
    ) %>%
    ungroup()
  1. Use the table from part (e) (not the full dataset) to create a grouped (or clustered) bar chart to investigate the relationship between the Education and Belief.
darwin_short %>%
  ggplot(aes(x = Education_fct %>% fct_rev(), 
             fill = Belief_fct,
             y = counts
             )) +
  geom_col(position = "fill") +
  coord_flip()

Refer back to the Stat 113 first day survey.

stat113 <- read_csv("data/Stat113Fall2021.csv")

Plot the GPA by Class such that the sections are ordered by the median GPA.

stat113 %>%
  mutate(Class = factor(Class)) %>%
  filter(!is.na(GPA)) %>%
  ggplot(aes(x =  fct_reorder(Class, GPA) , 
             y = GPA)) +
  geom_boxplot()