7 Confidence and Prediction Intervals for Individual Values

Getting Started As always, insert a chunk for the R packages we will need. This chunk is also a good place to source the

library(tidyverse)
library(ggiraphExtra)
source("~/rstudioshared/IRamler/scripts/CIPI_ggplot.R") # used below

Initial Analysis

We have data (HomesNNY.csv) on 48 recently sold homes in Canton and Potsdam.

HomesNNY <- read_csv("data/HomesNNY.csv")

We are interested in building a model to predict the selling price of area homes using the size of the home (in thousand square feet). Let’s fit that model here:

mod1 = lm(Price_thousands ~ Size_sqft_1k, data = HomesNNY)

ggPredict(mod1, se = TRUE) + labs(x="Size of Home (in thousand square feet)", y="Selling Price (in thousands of dollars)")
47 * sd(HomesNNY$Price_thousands)^2
summary(mod1)
  1. Here is the ANOVA table that you constructed by hand from the available information.
anova(mod1)
  1. You used the ANOVA table to compute \(R^2\) by hand. As a reminder, you can find that quantity in the summary of the model output.
summary(mod1)
  1. Use the provided plots to check the simple linear regression model assumptions. Do we have any reason to be concerned?
plot(mod1, pch=16)

Estimation and Prediction

  1. We can use the predict function to obtain the desired predictions and their intervals.
  1. The predict argument REQUIRES that you give it a dataframe that contains the values to predict at (names MUST be the same as in the original dataset!)
newx = data.frame(Size_sqft_1k = 2.3) # look at the newx object in your environment
  1. Use the predict function to predict the price of a 2,300 square foot home.
predict(mod1, newx)
  1. Use the predict function to obtain a 90% confidence interval for the mean price of all 2,300 square foot homes. Interpret the result.
predict(mod1, newx, interval = "confidence", level = 0.9)
  1. Use the predict function to obtain a 90% prediction interval for an individual 2,300 square foot home. Interpret the result.
predict(mod1, newx, interval = "prediction", level = 0.9)
  1. You can plot the two types of intervals on a scatterplot by first sourcing some ggplot code that Dr. Ramler has written.
# this is already sourced at the top
# source("/rstudioshared/IRamler/scripts/CIPI_ggplot.R")

Make the plot (the arguments after the first line are all optional. (See file for defaults.)

CIPI_ggplot(mod1, 
  conf.level = 0.9, 
  xlab = "Size (Thousands of Sq Ft)", 
  ylab = "Price (Thousands)")