16 Nested F Tests
Getting Started As always, insert a chunk for the R packages (and SLU functions) we will need.
library(tidyverse)
source("~/rstudioshared/IRamler/scripts/slunova.R") # for ANOVA test
For today’s topic, we are going to revisit many of the examples from the last several class periods.
16.1 Example 1: Roller Coasters
= read_csv("data/roller_bigger.csv") rc
We previously fit a model that allowed the relationship between length (feet) and top speed (mph) differ for roller coasters with inversions and those without.
ggplot(rc, aes(x=Length, y=Speed, color=Inversions)) +
geom_point() +
geom_smooth(method="lm",se=FALSE) +
labs(x="Length of Track (feet)", y = "Top Speed (mph)")
We defined an indicator variable to indicate presence of inversions
$InversionIND = ifelse(rc$Inversions=="Yes", 1, 0) rc
and proposed a model
\(Speed = \beta_0 +\beta_1 Length + \beta_2 InversionIND + \beta_3 Length*InversionIND + \epsilon\)
and fit the model
= lm(Speed ~ Length + InversionIND + Length*InversionIND, data=rc) modrc_full
Now, we want to conduct a single hypothesis test to determine if there is evidence of different lines for the two types of roller coasters.
- State the hypotheses for this test.
- To conduct this test, we need to know
- the amount of variation in speed that is currently being explained by this full model
- the mean square error for this full model.
slunova(modrc_full)
To test the significance of the terms in the model that create different lines for the two groups, we need to remove those terms. Which terms, specifically, do we need to remove from the model? Also note how many terms you are removing.
Fit a model with the identified terms removed, and determine the amount of variation in speed that is being explained by this reduced model.
= lm(Speed ~ Length, data=rc)
modrc_reduced slunova(modrc_reduced)
How much extra variation did these two terms explain?
Combine our answers to 2 - 5 to obtain the nested F test statistic needed to test our hypotheses.
Use the test statistic to find the p-value associated with this test (hint: F tests are always “Right Tail”).
Make an appropriate conclusion for this test.
16.2 Example 2: Complete Second Order Model for Product Quality
Next, we will revisit the complete second order model for product quality. There are several nested F tests that could be reasonable in this situation. I am not going to walk you through the steps this time.
= read_csv("data/ProductQuality.CSV") pqdata
Recall: Suppose you want to model the quality, y, of a product was a function of the temperature and the pressure. Four inspectors independently assign a quality score (between 0 and 100) to each product, and then the quality, y, is calculated by averaging the four scores. An experiment was conducted by varying the temperature between 80o and 100o F and pressure between 50 and 60 psi (pounds per square inch). The engineers believe that a complete second-order model would adequately describe the relationship between product quality and the two predictors (temperature and pressure).
Proposed Complete Second Order Model:
\(Quality = \beta_0 + \beta_1 Temp + \beta_2 Pressure + \beta_3Temp*Pressure + \beta_4Temp^2 + \beta_5Pressure^2 +\epsilon\)
Fitted Complete Second Order Model:
= lm(Quality ~ Temperature + Pressure + Temperature*Pressure + I(Temperature^2) + I(Pressure^2), data=pqdata) modpq_full
# info from the full model
slunova(modpq_full)
Nested F Test 1: Conduct a single hypothesis test to determine if there is evidence that including at least one of the quadratic terms improves the model.
<- lm(Quality ~ Temperature + Pressure + Temperature * Pressure,
modpq_noquad data = pqdata)
slunova(modpq_noquad)
8402.3 - 2425)/2 ) / 2.82 ( (
Nested F Test 2: Conducte a single hypothesis test to determine if there is evidence that including Pressure improves the model.
= lm(Quality ~ Temperature + I(Temperature^2), data=pqdata)
modpq_temp_only slunova(modpq_temp_only)
anova(modpq_temp_only, modpq_full)
# Don't do this!
# Cubic ("reduced") model is not nested within Full model
# F-test is not valid
<- lm(Quality ~ Temperature + I(Temperature^2) + I(Temperature^3),
modpq_cubic data = pqdata)
anova(modpq_cubic, modpq_full)
# Instead, use other "model building and assessment tools"
# e.g., Adjusted R-Squared
summary(modpq_cubic)
summary(modpq_full)