1 Getting Started with R
and R Studio
Goals:
Use
R Studio
on the serverUse
R Markdown
and code chunksLoad in data to
R Studio
Run code and change a few things within that code
1.1 Intro to R
and R Studio
A. Open R Studio
on the SLU R Studio
server at
http://rstudio.stlawu.local:8787 and create a folder called STAT_234 or some other meaningful title to you.
Note that you must be on campus to use the R Studio
server, unless you use a VPN. Directions on how to set-up VPN are
https://infotech.stlawu.edu/support/content/11269
for Macs and https://stlawu.teamdynamix.com/TDClient/1805/Portal/KB/ArticleDet?ID=55118 for Windows.
B. Next, create a subfolder within your STAT_234 folder. Title it notes (or whatever you want really). Tip: Try to not include spaces in the folder name, doing so can occasionally cause some annoying errors to occur.
C. Then, create an R Project
by Clicking File -> New Project -> Existing Directory, navigate to the notes folder, and click Create Project.
Within this folder, click the New Folder button in your bottom-left window and name a new folder data (for consistency, I recommend keeping it as all lowercase). Then, upload the data.zip file (located on the T drive) to the server by clicking “Upload” in the bottom right panel. In the dialog box that appears, you can click “Choose File” and navigate to the folder where the zip file is located. The zip file will automatically expand once uploaded. It includes a few of the data sets that we will use throughout the course. (We’ll be adding more to this throughout the semester.)
Finally, if we want to create a new R Markdown
file we can do so by clicking File -> New File -> R Markdown
. You can give your new R Markdown
file a title if you want, and then click okay.
Before moving on, click the Knit button in the top-left window at the top of the menu bar (look for the knitting needle icon). Name this file 01-getting-started.Rmd Make sure that the file knits to a pretty-looking .html file. The newly knitted .html file can now be found in your folder with your R
project.
1.2 What are R
, R Studio
, and R Markdown
?
The distinction between the 3 will become more clear later on. For now,
* R
is a statistical coding software used heavily for data analysis and statistical procedures.
R Studio
is a nice IDE (Integrated Development Environment) forR
that has a lot of convenient features. Think of this as just a convenient User Interface.R Markdown
allows users to mix regular Microsoft-Word-style text with code. The.Rmd
file ending denotes anR Mardkown
file.R Markdown
has many options that we will use throughout the semester, but there’s no need to worry about these now.
1.2.1 R
Packages and the tidyverse
You can think of R
packages as add-ons to R
that let you do things that R
on its own would not be able to do. If you’re in to video games, you can think of R
packages as extra Downloadable Content (DLC). But, unlike most gaming DLC, R
packages are always free and we will make very heavy use of R
packages.
The tidyverse
is a series of R
packages that are useful for data science. In the order that we will encounter them in this class, the core tidyverse
packages are:
readr
for data importggplot2
for plotting datadplyr
for data wrangling and summarizingforcats
for factor (categorical) datatidyr
for data tidying and reshapingtibble
for how data is storedstringr
for text datapurrr
for functional programming (we won’t use this in our intro class)
We will use packages outside of the core tidyverse
as well, but the tidyverse
is the main focus.
We are going to change one option before proceeding. In the top file menu, click Tools -> Global Options -> R Markdown and then uncheck the box that says “Show output inline for all R Markdown documents”. Don’t worry much about this for now, but changing this option just means that code results will appear in the bottom-left window and graphs will appear in the bottom-right window of R Studio
.
1.3 Putting Code in a .Rmd
File
In most analyses, the first thing that we will do that involves code is to load a package into R
with the library()
function. A package is just an R
add-on that lets you do more than you could with just R
on its own. To create a code chunk, click on the Insert Code button (that looks like a green C with a + on it) -> R
. For those that prefer keyboard shortcuts use “Ctrl + Alt + I” on PC or “Cmd + Option + I” on Mac. Within this code chunk, type in library(tidyverse)
and run the code by either
Clicking the “Run” button in the menu bar of the top-left window of
R Studio
or(Recommended) Clicking “Command + Enter” on a Mac or “Control + Enter” on a PC.
Note that all code appears in grey boxes surrounded by three backticks while normal text has a different colour background with no backticks.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'stringr' was built under R version 4.2.2
## Warning: package 'forcats' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
When you run the previous line, some text will appear in the bottom-left window. We won’t worry too much about what this text means now, but we also won’t ignore it completely. You should be able to spot the 8 core tidyverse
packages listed above as well as some numbers that follow each package. The numbers correspond to the package version. There’s some other things too, but as long as this text does not start with “Error:”, you’re good to go!
We have run R
code using an R
chunk. In your R
chunk, on a new line, try typing in a basic calculation, like 71 + 9
or 4 / 3
, them run the line and observe the result.
R
can perform basic calculations, but you could just use a calculator or Excel for that. In order to look at things that are a bit more interesting, we need some data.
71 + 9
## [1] 80
4 / 3
## [1] 1.333333
4 - 3
## [1] 1