7 Confidence and Prediction Intervals for Individual Values
Getting Started As always, insert a chunk for the R packages we will need. This chunk is also a good place to source the
library(tidyverse)
library(ggiraphExtra)
source("~/rstudioshared/IRamler/scripts/CIPI_ggplot.R") # used below
Initial Analysis
We have data (HomesNNY.csv) on 48 recently sold homes in Canton and Potsdam.
<- read_csv("data/HomesNNY.csv") HomesNNY
We are interested in building a model to predict the selling price of area homes using the size of the home (in thousand square feet). Let’s fit that model here:
= lm(Price_thousands ~ Size_sqft_1k, data = HomesNNY)
mod1
ggPredict(mod1, se = TRUE) + labs(x="Size of Home (in thousand square feet)", y="Selling Price (in thousands of dollars)")
47 * sd(HomesNNY$Price_thousands)^2
summary(mod1)
- Here is the ANOVA table that you constructed by hand from the available information.
anova(mod1)
- You used the ANOVA table to compute \(R^2\) by hand. As a reminder, you can find that quantity in the summary of the model output.
summary(mod1)
- Use the provided plots to check the simple linear regression model assumptions. Do we have any reason to be concerned?
plot(mod1, pch=16)
Estimation and Prediction
- We can use the
predict
function to obtain the desired predictions and their intervals.
- The predict argument REQUIRES that you give it a dataframe that contains the values to predict at (names MUST be the same as in the original dataset!)
= data.frame(Size_sqft_1k = 2.3) # look at the newx object in your environment newx
- Use the
predict
function to predict the price of a 2,300 square foot home.
predict(mod1, newx)
- Use the
predict
function to obtain a 90% confidence interval for the mean price of all 2,300 square foot homes. Interpret the result.
predict(mod1, newx, interval = "confidence", level = 0.9)
- Use the
predict
function to obtain a 90% prediction interval for an individual 2,300 square foot home. Interpret the result.
predict(mod1, newx, interval = "prediction", level = 0.9)
- You can plot the two types of intervals on a scatterplot by first sourcing some ggplot code that Dr. Ramler has written.
# this is already sourced at the top
# source("/rstudioshared/IRamler/scripts/CIPI_ggplot.R")
Make the plot (the arguments after the first line are all optional. (See file for defaults.)
CIPI_ggplot(mod1,
conf.level = 0.9,
xlab = "Size (Thousands of Sq Ft)",
ylab = "Price (Thousands)")