--- title: "Check all Cars data" author: "RLock" output: word_document: default html_document: default --- This document reads the car data for all students and finds the mean and standard deviation for each car model. It also produces boxplots to compare the distribution of each sample. These are useful to select sets of four car models for each student that may show some differences in means, but not extremely so, and have similar variability. ```{r message=FALSE, warning=FALSE} library(readr) library(mosaic) ``` ```{r} #path to wherever student datasets are stored #assumes the dataset names are of the form Studentname.csv datapath="C:/Users/rlock/Dropbox/Robin/Courses/!Stat213/Data/Project1/" ``` Get the Dataset names ```{r message=FALSE} #Project5Names.csv has two columns, Name=student name, Model=Name of their car model #Be sure each model name is unique, e;g; use CamryA and CamryB for duplicates students=read_csv("Project5Names.csv",show_col_types = FALSE) datanames=as.character(students$Name) models=as.character(students$Model) ``` Read all datasets in together ```{r message=FALSE} fulldata=read_csv(paste(datapath,"Baer.csv",sep=""),show_col_types = FALSE) fulldata$model=models[1] #loop to get each other student's dataset and add it to previous for (i in 2:length(datanames)){ path=paste(datapath,datanames[i],".csv",sep="") cars=read_csv(path,show_col_types = FALSE) cars$model=models[i] fulldata=rbind(fulldata,cars) } ``` Get means and sd. dev. ```{r} round(tapply(fulldata$price,fulldata$model,mean),1) round(tapply(fulldata$price,fulldata$model,sd),1) ``` Split up the boxplots for larger classes to keep them reasonable. Adjust the "fulldata" row limits as needed boxplots for all models (first half) ```{r fig.height=10,fig.width=10} boxplot(price~model,las=2,data=fulldata[1:1400,]) ``` boxplots for all models (second half) ```{r fig.height=10,fig.width=10} boxplot(price~model,las=2,data=fulldata[1401:2800,]) ``` boxplots for all models (firsthalf) ```{r fig.height=10,fig.width=10} par(mar=c(5,7,2,2)) boxplot(price~model,las=2,horizontal=TRUE,data=fulldata[1:1350,]) par(mar=c(5,4,2,2)) ``` boxplots for all models (second half) ```{r fig.height=10,fig.width=10} par(mar=c(5,7,2,2)) boxplot(price~model,las=2,horizontal=TRUE,data=fulldata[1351:2700,]) par(mar=c(5,4,2,2)) ```