16 Nested F Tests

Getting Started As always, insert a chunk for the R packages (and SLU functions) we will need.

library(tidyverse)
source("~/rstudioshared/IRamler/scripts/slunova.R")  # for ANOVA test

For today’s topic, we are going to revisit many of the examples from the last several class periods.

16.1 Example 1: Roller Coasters

rc = read_csv("data/roller_bigger.csv")

We previously fit a model that allowed the relationship between length (feet) and top speed (mph) differ for roller coasters with inversions and those without.

ggplot(rc, aes(x=Length, y=Speed, color=Inversions)) + 
  geom_point() + 
  geom_smooth(method="lm",se=FALSE) + 
  labs(x="Length of Track (feet)", y = "Top Speed (mph)")

We defined an indicator variable to indicate presence of inversions

rc$InversionIND = ifelse(rc$Inversions=="Yes", 1, 0)

and proposed a model

$Speed = \beta_0 +\beta_1 Length + \beta_2 InversionIND + \beta_3 Length*InversionIND + \epsilon$

and fit the model

modrc_full = lm(Speed ~ Length + InversionIND + Length*InversionIND, data=rc)

Now, we want to conduct a single hypothesis test to determine if there is evidence of different lines for the two types of roller coasters.

State the hypotheses for this test.
To conduct this test, we need to know

the amount of variation in speed that is currently being explained by this full model
the mean square error for this full model.

slunova(modrc_full)

To test the significance of the terms in the model that create different lines for the two groups, we need to remove those terms. Which terms, specifically, do we need to remove from the model? Also note how many terms you are removing.
Fit a model with the identified terms removed, and determine the amount of variation in speed that is being explained by this reduced model.

modrc_reduced = lm(Speed ~ Length, data=rc)
slunova(modrc_reduced)

How much extra variation did these two terms explain?
Combine our answers to 2 - 5 to obtain the nested F test statistic needed to test our hypotheses.
Use the test statistic to find the p-value associated with this test (hint: F tests are always “Right Tail”).
Make an appropriate conclusion for this test.

16.2 Example 2: Complete Second Order Model for Product Quality

Next, we will revisit the complete second order model for product quality. There are several nested F tests that could be reasonable in this situation. I am not going to walk you through the steps this time.

pqdata = read_csv("data/ProductQuality.CSV")

Recall: Suppose you want to model the quality, y, of a product was a function of the temperature and the pressure. Four inspectors independently assign a quality score (between 0 and 100) to each product, and then the quality, y, is calculated by averaging the four scores. An experiment was conducted by varying the temperature between 80o and 100o F and pressure between 50 and 60 psi (pounds per square inch). The engineers believe that a complete second-order model would adequately describe the relationship between product quality and the two predictors (temperature and pressure).

Proposed Complete Second Order Model:

$Quality = \beta_0 + \beta_1 Temp + \beta_2 Pressure + \beta_3Temp*Pressure + \beta_4Temp^2 + \beta_5Pressure^2 +\epsilon$

Fitted Complete Second Order Model:

modpq_full = lm(Quality ~ Temperature + Pressure + Temperature*Pressure + I(Temperature^2) + I(Pressure^2), data=pqdata)

# info from the full model
slunova(modpq_full)

Nested F Test 1: Conduct a single hypothesis test to determine if there is evidence that including at least one of the quadratic terms improves the model.

modpq_noquad <- lm(Quality ~ Temperature + Pressure + Temperature * Pressure,
                   data = pqdata)
slunova(modpq_noquad)

( (8402.3 - 2425)/2 ) / 2.82

Nested F Test 2: Conducte a single hypothesis test to determine if there is evidence that including Pressure improves the model.

modpq_temp_only = lm(Quality ~ Temperature + I(Temperature^2), data=pqdata)
slunova(modpq_temp_only)

anova(modpq_temp_only, modpq_full)

# Don't do this!
# Cubic ("reduced") model is not nested within Full model
# F-test is not valid

modpq_cubic <- lm(Quality ~ Temperature + I(Temperature^2) + I(Temperature^3),
                  data = pqdata)

anova(modpq_cubic, modpq_full)

# Instead, use other "model building and assessment tools"
# e.g., Adjusted R-Squared
summary(modpq_cubic)
summary(modpq_full)