10 Communication with RMarkdown

10.1 Reproducbility

We’ve been using R Markdown for a while now, but have not yet talked about most of its features or how to do anything except insert a new code chunk. By the end of this section, we want to be able to use some of the R Markdown options to make a nice-looking document (so that you can implement some of these options in your first mini-project).

Reproducibility is a concept that has recently gained popularity in the sciences for describing analyses that another researcher is able to repeat. That is, an analysis is reproducible if you provide enough information that the person sitting next to you can obtain identical results as long as they follow your procedures. An analysis is not reproducible if this isn’t the case. R Markdown makes it easy for you to make your analysis reproducible for a couple of reasons:

  • an R Markdown file will not knit unless all of your code runs, meaning that you won’t accidentally give someone code that doesn’t work. (Note: You can override this feature, but it is not recommended unless you are intentionally providing someone with borken code.)

  • R Markdown combines the “coding” steps with the “write-up” steps into one coherent document that contains the code, all figures and tables, and any explanations.

10.1.1 Spell-Checking

If using R Markdown for communication, you probably want to utilize its spell-check feature. Go to Edit -> Check Spelling, and you’ll be presented with a spell-checker that lets you change the spelling of any words you may have misspelled.

10.2 Exercises

  1. Your friend Chaz is doing a data analysis project in Excel to compare the average GPA of student athletes with the average GPA of non-student athletes. He has two variables: whether or not a student is a student athlete and GPA. He decides that a two-sample t-test is an appropriate procedure for this data (recall from Intro Stat that this procedure is appropriate for comparing a quantitative response (GPA) across two groups). Here are the steps of his analysis.
  1. He writes the null and alternative hypotheses in words and in statistical notation.

  2. He uses Excel to make a set of side-by-side boxplots. He changes the labels and the limits on the y-axis using Point-and-Click Excel operations.

  3. From his boxplots, he see that there are 3 outliers in the non-athlete group. These three students have GPAs of 0 because they were suspended for repeatedly refusing to wear masks indoors. Chaz decides that these 3 students should be removed from the analysis because, if they had stayed enrolled, their GPAs would have been different than 0. He deletes these 3 rows in Excel.

  4. Chaz uses the t.test function in Excel to run the test. He writes down the degrees of freedom, the T-stat, and the p-value.

  5. Chaz copies his graph to Word and writes a conclusion in context of the problem.

State 2 aspects of Chaz’s analysis that are not reproducible.

10.3 R Markdown Files

Let’s talk a bit more about the components of the R Markdown file used to make the reproducible analysis shown in class.

First, open a new R Markdown file by clicking File -> New File -> R Markdown and keep the new file so that it knits to HTML for now.

10.3.1 YAML

The first six lines at the top of the file make up the YAML (Yet Another Markup Language) header. We’ll come back to this at the end, as it’s the more frustrating part to learn.

Lines 8-10 are the set-up chunk. Again, we’ll come back to this in a bit. For now, just delete lines 12-30 and copy and paste the following code chunks to your clean .Rmd file:

library(tidyverse)
head(cars)
ggplot(data = cars, aes(x = speed, y = dist)) +
  geom_point()
summary(cars)

The cars data set is built into R so there’s no need to do anything to read it in (it already exists in R itself).

10.3.2 Code Chunk Options

We’ve seen a few code chunk options earlier this semester, but this document revisits them.

First, knit, your new file (and give it a name, when prompted). You should see some code, a couple of results tables, and a scatterplot.

Chunk options allow you to have some control over what gets printed to the file that you knit. For example, you may or may not want: the code to be printed, the figure to be printed, the results to be printed, the tidyverse message to be printed, etc. See page 2 as a reference for R chunk options: https://rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf. There’s a ton of them! We are going to just focus on a few that are more commonly used.

  • echo. This is set to either TRUE to print the code or FALSE to not print the code. After the r in the first line of your first code chunk, add a , echo = FALSE inside the curly braces and reknit to see what happens!

You can keep adding other options, each separated by a comma. Some other options include:

  • message. This is set to either TRUE to print messages or FALSE to not print messages. When you load in the tidyverse, a message automatically prints out. In that same code chunk, add a , message = FALSE to get rid of the message. Re-knit to make sure the message is actually gone.

  • warning. This is set to either TRUE to print warnings or FALSE to not print warnings. We don’t have any warnings so changing this in our current code chunks won’t do anything.

  • results. By default, this is set to ‘markup’ and shows results of tables. Change this to ‘hide’ to not print the results. Practice adding a , results = 'hide' to the code chunk in your R Markdown file with summary(cars) and re-knit to make sure the results from summary(cars) are gone.

  • fig.keep. Add fig.keep = 'none' to not print a figure. Practice adding a , fig.keep = 'none' to the code chunk options with the scatterplot and re-knit to make sure the figure is gone. fig.keep can also be set to 'last', in which case R will only keep the last figure created in a code chunk.

  • fig.height and fig.width control the height and width of figures. By default, these are both 7, but I often change the fig.height to make figures shorter (fig.height = 5, for example).

  • fig.cap adds a figure caption to your figure. Try inserting , fig.cap = "Figure 1: caption text blah blah blah" to your chunk options.

  • include, eval, and collapse are also sometimes useful: check these out in the reference guide! (Note, Dr. Ramler uses the eval option frequently. It allows you to print the code chunk without evaluating it. He finds it useful when he posts solutions containing only the code.)

Finally, you’ll notice that each time you make a new document, there is a code chunk at the beginning called “setup.” By default, setup has echo = TRUE as a global option. A global option is something that gets applied to all code chunks in the entire document. So, having echo = TRUE means that all code chunks will have their code printed, unless specifically overridden in that particular chunk. So, having echo = TRUE means that all code will print, except in chunks where you have set echo = FALSE. You can add other options to the global chunk like fig.height = 5 to make all figures in all chunks have a height of 5 instead of adding this option to each and every chunk.

10.3.3 Figures and Tables

We’ve already seen that Figures will pop up automatically (unless we set fig.keep = 'none'), which is quite convenient. Making tables that look nice requires one extra step.

Delete the results = 'hide' option that you added earlier. When you knit your .Rmd file now, results tables from head(cars) and summary(cars) look kind of ugly. We will focus on using the kable() function from the knitr package to make these tables much more aesthetically pleasing. Another option is to use the pander() function in the pander package. Both pander() and kable() are very simple functions to generate tables but will be more than sufficient for our purposes. To generate more complicated tables, see the xtable package.

To use these functions, simply replace add a %>% pipe with the name of the table function you want to use. head(cars) %>% kable() will make a nice-looking table with kable and head(cars) %>% pander() will use pander(). Before using kable(), you’ll need to load its library by adding the line library(knitr) above head(cars) %>% kable(). Before using pander(), you’ll need to load its library by adding the line library(pander) above head(cars) %>% pander(). Try these out in your R Markdown file.

Which table do you like better in this case?

There are plenty of options for making tables look presentable, which we will discuss in the Exercises. Keep in mind that you probably wouldn’t use these when making tables for yourself. They’re much more useful when you’re writing a report that you want to share with others.

10.3.4 Non-Code Options

R Markdown combines R (in the code chunks, which we’ve already discussed) with the Markdown syntax, which comprises the stuff outside the code chunks, like what you’re reading right now!

There are so many Markdown options, but most of the time, if you want to do something specific, you can just Google it. The purpose of what follows is just to get us familiar with the very basics and things you will probably use most often.

Bullet Points and Sub-bullet Points: Denoted with a * and -, respectively. The sub bullets should be indented by 4 spaces. Note that bullet points are not code and should not appear in a code chunk.

* Bullet 1
* Bullet 2
    - Sub bullet 1
    - Sub bullet 2
    - Sub bullet 3

Note: Everything in Markdown is very particular with spacing. Things often have to be very precise. It takes some getting use to and can be frustrating sometimes. For example, indenting a sub-bullet by 3 spaces instead of 4 spaces will not make a sub-bullet.

* Bullet 1
   - Sub bullet 1

Numbered Lists are the same as bulleted ones, except * is replaced with numbers 1., 2., etc.

Bold, Italics, Code. Surround text with __bold text__ to make text bold, _italic text_ to make text Italics, and backticks to make text look like Code.

Links: The simplest way to create a link to something on the web is to surround it with < > as in <https://www.youtube.com/watch?v=gJf_DDAfDXs>

If you want to name you link something other than the web address, use [name of link](https://www.youtube.com/watch?v=gJf_DDAfDXs), which should show up in your knitted document as “name of link” and, when clicked on, take you to the youtube video.

Headers: Headers are created with ## with fewer hashtags resulting in a bigger Header. Typing in #Big Header at the beginning of a line would make a big header, ### Medium Header would make a medium header, and ##### Small Header would make a small header. Headers are important because they get mapped to a table of contents.

There’s a lot of other stuff to explore: <a href=“https://rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf” target="blank> https://rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf .

But, if you want to do something other than the basics, Google will definitely help.

10.3.5 YAML Revisited

Finally, we can return to what’s given at the top of every .Rmd file: a YAML header. The YAML header is the most frustrating part to change because it’s the most particular with spacing.

The biggest thing that you can take advantage of with YAML is themes that other people have written for R Markdown. By default, we’re just using R Markdown’s default theme, which looks okay.

One package that you can use to make a “pretty” themed document easily is the rmdformats package using the readthedown theme. This does a lot of the heavy lifting in making the resulting .html file super nice to look at.

For example, paste the following lines over your YAML header in your current .Rmd file.

---
title: "Communication with `R Markdown` and `ggplot2`"
author: "Your friend Chaz"
output: 
  rmdformats::readthedown:
    toc_depth: 5
---

The toc_depth: 5 controls the required number of header ##### for something to appear in the Table of Contents (toc). The document you just knitted won’t have anything in the table of contents because you haven’t included any headers with 5 or fewer hashtags.

As mentioned before, the spacing with these YAML headers is extremely important. For example, delete one of the spaces at the beginning of the line for toc_depth: 5 in the R Markdown file you created. The file should no longer knit, all because we deleted a single space.

Another package that you could use to create a pretty .html file is the prettydoc package (you might recall using this package for reports if you took STAT 213 with Professor Higham). Try copying and pasting the following over your .Rmd YAML header in the R Markdown file you created:

---
title: "Title"
author: "Name"
date: "Put Today's Date"
output: 
  prettydoc::html_pretty:
    theme: hpstr
    toc: true
---

prettydoc has 5 themes for you to choose from. The YAML header above uses hpstr. Other choices are cayman, tactile, architect, and leonids.

10.4 Exercises

For the rest of this section, we will use the built-in R data set mtcars, which has observations on makes and models of cars. The variables we will be using are:

  • cyl, the number of cylinders a car has
  • mpg, the mileage of the car, in miles per gallon

Because the data set is loaded every time R is started up, there is no need to have a line that reads in the data set. We can examine the first few observations with

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
  1. * Create a table showing the mean mpg for each cyl group (cyl stands for cylinder and can be 4-cylinder, 6-cylinder, or 8-cylinder) with both kable() and pander(). Hint: remember to load the knitr library and the pander library.

  2. * Type ?kable into your console window and scroll through the Help file. Change the rounding of the mean so that it only displays one number after the decimal. Then, add a caption to the table that says “My First Table Caption!!”

  3. * Google “How to Change Column Names in kable” and replace the column names with “Cylinder Numb.” and “Mean Mileage.”

  4. Change the YAML header in your Exercise 1 Rmd file (or, really, just any R markdown file that you know knits) so that you are the Author and so that the file uses either one of the prettydoc themes or the readthedown theme.