--- title: "Project 1 - Key" output: word_document: reference_docx: WordforProject1.docx html_document: default --- ```{r echo=FALSE,warning=FALSE} library(readr) ``` ```{r echo=FALSE} #put a path to wherever student datafiles are stored path="../../Data/Project1/" #edit "Mystudent.csv" in the chunk below for a specific student's data ``` ```{r message=FALSE} cars=read_csv(paste(path,"Mystudent.csv",sep=""),show_col_types = FALSE) ``` #### #2. Summary statistics ```{r echo=F, results=F} age=cars$age price=cars$price n=length(age) ssx=sum((age-mean(age))^2) ssy=sum((price-mean(price))^2) ssxy=sum((age-mean(age))*(price-mean(price))) SSModel=(ssxy^2)/ssx SSE=ssy-SSModel ``` $\overline{age}=`r round(mean(age),2)`$, $s_{age}=`r round(sd(age),2)`$, $\qquad \overline{price}=`r round(mean(price),2)`$, $s_{price}=`r round(sd(price),2)`$ $SSX=`r round(ssx,1) `$, $SSY=`r round(ssy,1) `$, $SSXY=`r round(ssxy,1)`$, $SSModel=`r round(SSModel,1) `$, $SSE= `r round(SSE,1) `$ #### #3. Least squares line ```{r echo=F, results=F} b1=ssxy/ssx b0=mean(price)-b1*mean(age) Se=sqrt(SSE/(n-2)) ``` $\hat{\beta}_1=\frac{`r round(ssxy,1)`}{`r round(ssx,1)`}=`r round(b1,3)`$ and $\hat{\beta}_0=`r mean(price,1)`+`r round(b1,3)`*`r round(mean(age),1)`=`r round(b0,2)`$ so $\widehat{price}=`r round(b0,2)`+`r round(b1,3)`*age$ #### #4. Std. dev of error $\hat{\sigma}_\epsilon=\sqrt{\frac{`r round(SSE,1)`}{`r n`-2}}=\sqrt{`r round(Se^2,2)`}=`r round(Se,2)`$ ##### #5 Scatterplot with regression line & residual plots ```{r fig.width=7, fig.height=2,echo=F} par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) model=lm(price~age) resid=model$residuals plot(price~age,main=" Price vs. Age") abline(model) abline(h=0,lty=2) plot(resid~model$fitted,main="Residuals vs. Fits") abline(0,0) # hist(resid) qqnorm(resid) qqline(resid) ``` #### #6 Largest residual ```{r echo=F,resutls=F} pos=which.max(abs(resid)) ``` Case #`r pos` has residual=`r round(resid[pos],2)` with studentized residual=`r round(rstudent(model)[pos],2)`, Hi =`r round(hatvalues(model)[pos],3)` and Cook's D=`r round(cooks.distance(model)[pos],3)`
Criteria for leverage are 4/n=`r round(4/n,3)` (mild) and 6/n=`r round(6/n,3)` (severe) #### #7 CI for slope ```{r echo=F,results=F} tstar=qt(0.95,n-2) SEslope=Se/sqrt(ssx) lslope=b1-tstar*SEslope uslope=b1+tstar*SEslope ``` $`r round(b1,3)` \pm `r round(tstar,3)`*`r round(SEslope,3)`= (`r round(lslope,3)` , `r round(uslope,3)`)$ #### #8 Three tests for a relationship ```{r echo=F, results=F} F=SSModel/(SSE/(n-2)) pvalue=1-pf(F,1,n-2) r=cor(price,age) ``` $t=\frac{`r round(b1,3)` }{`r round(SEslope,3)` }=\frac{`r round(r,3)`\sqrt{`r n`-2}}{\sqrt{1-(`r round(r,3)`^2 )}}=`r round(b1/SEslope,3)`$ and $F=\frac{`r round(SSModel,1)`}{`r round(SSE/(n-2),1)`}=`r round(F,2)`$ with p-value=`r pvalue` #### #9. R-squared $r^2 =(`r round(r,3)`)^2 =\frac{`r round(SSModel,1)`}{`r round(ssy,1)`}=`r round(r^2,3)`$ #### #10 Prediction for age=`r agestar` we have $\widehat{price}= `r round(b0+b1*agestar,2)`$ ```{r echo=F, results=T} yhat=b0+b1*agestar d=1/n+(agestar-mean(age))^2/ssx lmean=yhat-tstar*Se*sqrt(d) umean=yhat+tstar*Se*sqrt(d) lpred=yhat-tstar*Se*sqrt(1+d) upred=yhat+tstar*Se*sqrt(1+d) ``` $$CI=`r round(yhat,2)` \pm `r round(tstar,3)`*`r round(Se,3)`\sqrt{\frac{1}{`r n`}+\frac{(`r agestar`-`r round(mean(age),2)`)^2}{`r ssx`}}=(`r round(lmean,2)` , `r round(umean,2)`)$$ $$ PI= (`r round(lpred,2)` , `r round(upred,2)`)$$ #### #11 Free car is at age=`r round(-b0/b1,1)`