3 Introduction to Simple Linear Regression

Getting Started Insert a chunk for the R packages we will need.

library(readr)
library(ggplot2)
library(ggiraphExtra)

3.1 Guided Example: Sting Length versus Body Length

Motivating Question: “While many recent studies have focused on the pain inflicted by the stings of various stinging wasps, bees, or ants (Hymenoptera: Aculeata), little is known about how the length of the sting itself varies between species.”

  1. Upload the VelvetAntSting data from the T drive and save in data directory. Insert an R chunk to read in that data.
stings <- read_csv("data/VelvetAntStings.csv")
  1. Construct a histogram to examine how sting length (mm) varies.
ggplot(data = stings,
       mapping = aes(x = Sting)
       ) +
  geom_histogram(bins = 9)
  1. The researchers propose using Mesosomal length as a proxy for body length, and they believe Mesosomal length (i.e., body length) provides information about sting length. What is the correlation between sting length and mesosomal length?
cor(stings$Sting, stings$Mesosoma)
  1. Produce a scatterplot of sting length versus mesosomal length. Add a smoother. Does a linear trend seem reasonable?
base_plot <- ggplot(data = stings,
       mapping = aes(x = Mesosoma, y = Sting)
       ) +
  geom_point() 

base_plot + geom_smooth(se = FALSE) 
  1. Fit the least squares regression equation for modeling sting length from mesosomal length. Write down the equation below.
sting_vs_meso <- lm(Sting ~ Mesosoma, data = stings)
sting_vs_meso
  1. Produce a scatterplot of sting length versus mesosomal length. Add the least squares regression line. Do not put the standard error on the display.
base_plot +
  geom_smooth(method = "lm", se = FALSE)
ggPredict(sting_vs_meso) +
  labs(x = "Mesosomal Length (mm)", y = "Stinger Length (mm)")
  1. Interpret the slope in the context of the problem.

  2. Give the literal interpretation of the intercept in the context of the problem.

  3. By hand, compute the residual for Atillum jucundum (first entry in the dataset).

  4. Click the arrow next to the name of our model object and browse through the options. Find the options that provide the fitted values and residuals for each observation in the dataset. Add an R chunk and type the commands to view the fitted and residual values for all species in the dataset.

# Extracting predictions
sting_vs_meso$fitted.values # prints all fitted values

sting_vs_meso$fitted.values[1] # extracts just first value

sting_vs_meso$fitted.values[ c(1, 4 ,9) ]

sting_vs_meso$fitted.values[ 1:7 ]
# Extract the residuals (all and just for the first insect)

sting_vs_meso$residuals
sting_vs_meso$residuals[1]
  1. What is the “size of a ‘typical’ error” (aka, standard error of regression or residual standard error)?
# close to the best way of finding this
sd(sting_vs_meso$residuals)
sting_vs_meso_summary <- summary(sting_vs_meso) 

sting_vs_meso_summary # gives lots of info

sting_vs_meso_summary$sigma # extract just the residual standard error
  1. What are the model assumptions for simple linear regression?

  2. Produce plots to check the assumptions for the linear model we have used in this application. Do they seem reasonably met?

plot(sting_vs_meso)

3.2 Your Turn: Sting Length versus Pain

For practice, you will consider using sting length to model and predict pain. You will repeat many of the steps from the last example. I’ve provided an outline of the analysis you will conduct. Please insert the appropriate R chunks to conduct the analysis. Collaboration with your neighbors is encouraged. (Side note: I don’t actually think this is a great model, given the highly discrete nature of the pain variable, but it’s good enough for extra practice!)

  1. Produce a scatterplot of pain versus sting length. Add a smoother. Does a linear trend seem reasonable?

  2. Fit the least squares regression equation for modeling pain from sting length. Note that you should use a different name for this model, like “mod_pain”. Write down the equation below.

  3. Produce a scatterplot of pain versus sting length. Add the least squares regression line. Do not put the standard error on the display.

  4. Interpret the slope in the context of the problem.

  5. What is the predicted pain of a sting by Sigilla dorsata? What is the residual for this species?

  6. What is the “size of a ‘typical’ error” (aka, standard error of regresssion or residual standard error)?

  7. Produce plots to check the assumptions for the linear model we have used in this application. Do they seem reasonably met?

When we are done for the day, save your R Markdown file and close your R project to save it (say “Save” when asked)