10 Indicator Variables (Parallel Lines)

Getting Started As always, insert a chunk for the R packages (and SLU functions) we will need.

library(tidyverse)
source("~/rstudioshared/IRamler/scripts/slunova.R")  # for ANOVA test

Exploratory Data Analysis

The dataset PineCayLizards.csv contains measurements on small lizards ( Anolis scriptus ) sampled on Pine Cay in the Turks and Caicos Islands before and after Hurricanes Irma and Maria. Read in those data.

pclizards = read_csv("data/PineCayLizards.csv")
  1. Of interest to us today are snout-to-vent length (mm), forelimb toepad area (\(mm^2\), available in the dataset as “Mean Finger Area”), and before/after hurricane. Specifically, we wonder if the relationship between SVL and forelimb toepad area differs before and after the hurricanes. Investigate this visually by plotting a scatterplot, with smoothers, of forelimb toepad area versus SVL that uses different colors for measurements taken before and after the hurricanes. Discuss what you see.
ggplot(pclizards, aes(x=SVL, y=MeanFingerArea, color=  Hurricane  )) + 
  geom_point() + 
  geom_smooth(method="loess",se=FALSE) + 
  labs(x="Snout to Vent Length (mm)", y = "Forelimb Toepad Area (mm^2)")

Model Formulation

  1. For before and after separately propose a model for the relationship between SVL and forelimb toepad area for the entire population of lizards.

  2. Are are the parameters in the two proposed models related to one another?

  3. The following command will define an indicator variable. What did this do to our dataset?

pclizards$HurricaneIND = ifelse(pclizards$Hurricane == "After", 1, 0)
  1. Fit the proposed model and write the estimated equation in the space provided.
mod1 = lm(MeanFingerArea ~ SVL + HurricaneIND  , data=pclizards)
summary(mod1)
  1. What does the t-test for the coefficient on the indicator term tell us?

  2. Construct and interpret a 95% confidence interval for the population coefficient on the indicator term.

confint(mod1)