1. Read in the Stat 113 data from http://myslu.stlawu.edu/~iramler/stat201/datasets/Stat113Fall2016.csv.

    1. Find the mean GPA and number of students by section. Try using piping.

    2. Select the SAT columns using the select function (see “Helper” part on sheet). Print rows with the top 10 Math SAT scores.

    3. Count how many are missing Math SAT scores.

    4. Count how many are missing at least one of the SAT scores. (Hint: Use | for “or”.)

    5. Calculate BMI for students…append this new variable to the stat 113 data and create a new object called statBMI to store all the results.

    6. Keep only columns of BMI and sport question. Call this new object sportBMI.

    7. Keep only rows where people answered the sport question and replace sportBMI with this cleaned data.

      1. Start by first determining what choices there were for the sport question (Hint: use the table function that is not part of the dplyr package.)
      2. Now filter out the blanks.
      3. Check your result with the head function.
      4. FYI, if you mess something up, sportBMI is messed up (since you are saving over the old object). If this happens, go back to part f to remake it and start fresh.
    8. If you have not done so already, combine parts e, f, and g into one long string of piped commands. (Be sure to drop the NA BMIs as well.)

    9. Compare BMI for athletes vs non-athletes.
      1. Graphically - See if you can use a violin plot and put a boxplot inside of it.
      2. Numerically - Get the 5 number summary and the number of students in each group. (Hint: Check out the quantile function. You’ll need to probs option.)
    10. Assuming that the Stat 113 students in this data represent a random (or at least representative) sample of SLU students, is there a statistically significant difference in average BMI values between athletes and non-athletes?

    11. Using the 5 number summary, compare athletes vs non-athlete by gender. Hint: You’ll need to “remake” the data and possible clean it some first.

    12. Using ggplots, graphically compare athletes vs non-athlete by gender. Hint: You have two factors here…if googling for help, be sure to include that in your search.

  2. Starting with the full Stat 113 dataset
    1. Check out the “Data Visualization” sheet and pick a few plots to try to make.
      • Be sure to think about what type of data is needed
      • Might be helpful to check out google for more details for examples on a command
    2. Add numerical summaries and/or inference to the research questions you’ve come up with.