21 Analysis of Variance

library(tidyverse)
library(ggiraphExtra)

Motivating Question 1: Does your dog help to relieve stress? To examine the effect of pets in stressful situations, researchers recruited 45 women who said they were dog lovers. Fifteen subjects were randomly assigned to each of three groups to do a stressful task alone (control), with a good friend present, or with their dog present. The subject’s average heart rate during the task was reported. The data are available as Stress.csv.

stress = read_csv("data/Stress.csv")
ggplot(stress, aes(x=group, y=rate)) + geom_boxplot(fill="steelblue") + labs(x="Experimental Group",y="Heart Rate (bpm)")

One-Way Analysis of Variance (ANOVA)

  1. Write out the ANOVA model.

  2. Complete the following ANOVA table.

SSTotal = (45 - 1) * sd(stress$rate)**2
SSTotal - 3561.6
2387.4/2
1193.7/84.8
  1. Conduct the analysis of variance F test. Include all of the details of the test.

  2. Use R to verify the entries in the above ANOVA table.

# ANOVA in R
modaov = aov(rate~group,data=stress)
summary(modaov)

model.tables(modaov, type = "means")

model.tables(modaov, type = "effects")
  1. Use lm in R to obtain the same ANOVA information.
# ANOVA as lm
modlm = lm(rate~group, data=stress)
summary(modlm)

modaov$coefficients
model.tables(modaov, type = "means")
model.tables(modaov, type = "effects")

Follow-up Analysis for ANOVA

plot( y =  1 - (1 - 0.01)^(1:10), x = 1:10)

Because we have found evidence of a difference among the three population mean, a follow-up analysis is appropriate.

  1. Use R to obtain the output for the Tukey adjusted follow-up analysis.
TukeyHSD(modaov)

# choose(3, 2)

ggHSD( TukeyHSD(modaov) , no = 1)
  1. Summarize the results from the Tukey adjusted follow-up analysis.

Motivating Example 2: Mussels (and other aquatic invertebrates) are frequently used to assess the level of contamination of various metals in fresh and salt water. Two species of mussels ( quagga and zebra) have been used by (retired) Prof. Carrie Johns (from Environmental Studies) to assess contamination levels at four different sites along the St. Lawrence River. One of the metals of interest is copper. She wants to know if the mean copper concentration differs for the two types of mussels and if there are differences in the mean concentration levels among the four sites (she suspects that both are true). Note that the data for this example are simulated (because her paper on this research project is not yet published).

mussels = read_csv("data/FakeMussels.csv")

Model 1: One-Way ANOVA for Site

  1. Make a plot to visually look for evidence of any differences in average copper levels among the sites. What do you notice, and how you can fix it?
ggplot(mussels, aes(x=SiteFactor, y=Fake.Copper)) + geom_boxplot(fill="steelblue") + labs(x="Site",y="Copper (ppm)")
mussels$lnCopper = log(mussels$Fake.Copper)
ggplot(mussels, aes(x=SiteFactor, y=lnCopper)) + geom_boxplot(fill="steelblue") + labs(x="Site",y="ln(Copper)")
  1. Fit a one-way ANOVA model to look for evidence of any differences in average ln copper levels among the sites. Write down the ANOVA table and add a “Total” row.
# ANOVA in R
modaovSite = aov(lnCopper~SiteFactor,data=mussels)
summary(modaovSite)
plot(modaovSite, pch=16)

TukeyHSD(modaovSite)

Model 2: One-Way ANOVA for Species

  1. Fit a one-way ANOVA model to look for evidence of any differences in average ln copper levels among the species. Write down the ANOVA table and add a “Total” row.
# ANOVA in R
modaovSpecies = aov(lnCopper~SpeciesFactor,data=mussels)
summary(modaovSpecies)
plot(modaovSpecies, pch=16)

Model 3: Two-Way ANOVA

  1. Fit a two-way ANOVA model to look for evidence of any differences in average ln copper levels among either the sites and/or the species.
mod2aov = aov(lnCopper ~ SiteFactor + SpeciesFactor, data=mussels)
summary(mod2aov)
plot(mod2aov,pch=16)
  1. Write out the population two-way ANOVA model.

  2. Add a “Total” row to the resulting ANOVA table.

  1. What do you notice about how this ANOVA table compares to the two one-way ANOVA tables?

  2. Is there evidence of a significant site effect? Include all details of the test.

  3. Is there evidence of a significant species effect? Include all details of the test.

  4. Estimate the site effects.

  5. Estimate all of the species effects.

model.tables(mod2aov, type = "means")
model.tables(mod2aov, type = "effects")
  1. Estimate all of the site effects.

Follow-up Analysis 8. Use the Tukey adjustment to summarize the significant differences among the sites and significant differences among the species.

TukeyHSD(mod2aov) # if both main effects are significant
TukeyHSD(mod2aov, "SiteFactor") # if only one main effect is significant (just an example)
ggHSD(TukeyHSD(mod2aov), no = 1 )

ggHSD(TukeyHSD(mod2aov), no = 2 )

Model 4: Two-Way ANOVA with Interactions

  1. Start by constructing plots to visually detect any obvious interaction(s).
ggplot(mussels,
       aes(y = lnCopper, x = SiteFactor, fill = SiteFactor)
       ) + 
  geom_boxplot() +
  facet_wrap(~SpeciesFactor)

interaction.plot(x.factor = mussels$SiteFactor, 
                 trace.factor = mussels$SpeciesFactor, 
                 response = mussels$lnCopper, 
                 lwd=2)

# make plot again by flipping x.factor and trace.factor
  1. Fit the Interaction model in R and report the ANOVA table.
mod2aov_int = aov(lnCopper ~ SiteFactor + SpeciesFactor + 
                    SiteFactor * SpeciesFactor, data=mussels)
summary(mod2aov_int)
  1. Is there a statistically significant interaction between Site and Species? Provide the details of the appropriate hypothesis test.

Motivating Example 3 Hamisi Babusa, former SLU professor and Kenyan scholar, administered a survey to 480 students from Pwani and Nairobi provinces about their attitudes towards the Swahili language. From each province, the students were from 6 schools (3 girls schools and 3 boys schools) with 40 students sampled at each school (so half of the students are male and half are female). The survey instrument contained 40 statements about attitudes towards Swahili, and students rated their level of agreement to each. Of these statements, 30 were “positive” statements and the remaining 10 statements were “negative”. On an individual question, the most positive response was assigned a value of 5 while the most negative response was assigned a 1. By adding the responses for each statement, we computed an Attitude Score for each student. The highest possible score was 200 and the lowest was 40.

Import SwahiliAttitudes.csv.

attitudes = read_csv(here::here("data/SwahiliAttitudes.csv"), name_repair = make.names)
ggplot(attitudes,aes(y = Attitude.Score, x=Province, fill=Sex)) + geom_boxplot()
interaction.plot(trace.factor = attitudes$Sex, 
                 x.factor = attitudes$Province, 
                 response = attitudes$Attitude.Score, 
                 lwd=2)
  1. What evidence present in these plots suggests potential for an interaction between Province and Sex?

  2. Write out a two-way ANOVA model that allows for an interaction between Province and Sex.

modinteraction = aov(Attitude.Score~Province + Sex + Province*Sex, data=attitudes)
summary(modinteraction)
  1. Is there evidence of a significant interaction between Province and Sex? Perform the appropriate hypothesis test.

  2. Since we have found a significant interaction, carry out a follow-up analysis to determine which pairs of “treatments” (levels of the interaction) are significantly different. Use a 5% significance level.

TukeyHSD(modinteraction, "Province:Sex")
ggHSD(TukeyHSD(modinteraction, "Province:Sex")) + 
  theme(axis.text.y = element_text(angle = 0)) # this can be useful for the axis labels