6 Inference for Simple Linear Regression
Getting Started As always, insert a chunk for the R packages we will need.
library(tidyverse)
library(ggiraphExtra)
Initial Analysis
The dataset WaterCayLizards.csv
contains measurements on small lizards ( Anolis scriptus ) sampled on Water Cay in the Turks and Caicos Islands before and after Hurricanes Irma and Maria. Read in those data. Of interest to us today are snout-to-vent length (mm) and forelimb toepade area (\(mm^2\), available in the dataset as MeanFingerArea
).
= read_csv("data/WaterCayLizards.csv") wclizards
- Fill in the missing pieces of the code to produce a scatterplot of forelimb toepad area versus snout-to-vent length (SVL), with a smoother. Does a linear trend seem reasonable?
ggplot(data = wclizards,
aes(x = SVL,
y = MeanFingerArea)) +
geom_point() +
geom_smooth(se = FALSE)
Propose a simple linear model (for the entire population of Anolis scriptus ) that uses SVL (mm) to explain forelimb toepad area (aka “MeanFingerArea”, \(mm^2\)).
Fit the least squares regression equation for modeling forelimb toepad area (aka “MeanFingerArea”, \(mm^2\)) from SVL (mm). Fill in the missing pieces of the code to add the least squares regression line to the scatterplot of forelimb toepad area versus SVL. Do not put the standard error on the display.
<- lm(MeanFingerArea ~ SVL, data = wclizards)
mod1 ggPredict(mod1)
summary(mod1)
- Produce plots to check the assumptions for the linear model we have used in this application. Do they seem reasonably met?
plot(mod1)
Inference
- We want to construct a 95% confidence interval for the population slope. To do so, first find the standard error of the sample slope.
summary(mod1)
sd(wclizards$SVL)
- Now we need critical values to make a 95% confidence interval in this situation. How many degrees of freedom are there in this situation? Use those to find the appropriate critical values.
qt( (1 - 0.95)/2, df= 83 )
Construct and interpret a 95% confidence interval for the population slope in this situation.
As almost always, there is a way to get R to compute the confidence interval for you.
# feed in the lm object, not the data
round( confint(mod1, level = 0.95), 3)
Now suppose we want to conduct a hypothesis test to determine if there is evidence that SVL is a useful predictor of forelimb toe area. Include all details of the appropriate hypothesis test.
Construct, by hand, the ANOVA table for this regression analysis. Is there evidence of a linear relationship between the two variables? Use the ANOVA table to perform the appropriate hypothesis test. Be sure to include all the appropriate parts.
# SST
= mean(wclizards$MeanFingerArea)
mean_y sum( (wclizards$MeanFingerArea - mean_y)^2 )
# another way
85-1)*sd(wclizards$MeanFingerArea)^2 (
# SSE
sum( mod1$residuals^2 )
# SSM
25.33 - 3.18
summary(mod1)
3.18/83
22.15/0.0383
- Now use R to find the ANOVA table.
anova(mod1)
- Report the coefficient of determination for this model.
summary(mod1)