11 Indicator Variables, Part 2 (Non-Parallel Lines)
Getting Started As always, insert a chunk for the R packages (and SLU functions) we will need.
library(tidyverse)
source("~/rstudioshared/IRamler/scripts/slunova.R") # for ANOVA test
Exploratory Data Analysis The dataset roller_bigger.csv contains data on over 400 roller coasters, including information about speed, length, height, presence of inversions, number of inversions, and type of track. Read in those data.
= read_csv("data/roller_bigger.csv") rc
- Does the relationship between length (feet) and top speed (mph) differ for roller coasters with inversions and those without? Investigate this visually by plotting a scatterplot, with smoothers, of top speed (mph) versus length (feet) that uses different colors for roller coasters with inversions and those without.
ggplot(rc, aes(x=Length, y=Speed, color= Inversions )) +
geom_point() +
geom_smooth(method="lm",se=FALSE) +
labs(x="Length of Track (feet)", y = "Top Speed (mph)")
- Define an indicator variable for including presence of inversions in a statistical model.
$Inversions_Yes = ifelse(rc$Inversions == "Yes", 1, 0) rc
Write out the population model that would allow us to have different intercepts and slopes for the two types of roller coasters.
Fit the model. Write down the equation and interpret the estimated slopes and intercepts of the two lines.
= lm(Speed ~ Length + Inversions_Yes + Length*Inversions_Yes,
mod data=rc)
summary(mod)
Is there evidence that the lines for the two groups have significantly different intercepts?
Is there evidence that the lines for the two groups have significantly different slopes?