2.2 R and RStudio
Throughout this book, we are going to use the software R to conduct statistical analysis. We will use RStudio to interact with R and you can think of RStudio as a graphical user interface for R. R/RStudio has a similar data setup to Excel but instead of seeing the spreadsheet all the time, the spreadsheet with your data is in the background. R/RStudio requires that your columns be the variables and that the rows are your observations. For convenience, we are going to use Excel for data setup. For R/RStudio, there are great online resources:
- ETH Zurich: If you are looking for documentation about all the various functions in R, this is the website to check them. Note that in most cases, the result of a Google search leads to the ETHZ page.
- StatMethods: This webpage contains a lot of tutorials and introduces you to the basic functionality of R.
- Statistical tools for high-throughput data analysis (STHDA): The site was great to learn how to do data visualization with the R package ggplot2.
R/RStudio has several advantages over Excel:
- R/RStudio is set up to do statistical analysis. Excel is easy to use but has very limited capabilities.
- It is important for your future employer to know that you have been exposed to a modern statistical software besides Excel. You might be the one that introduces a specialized statistical software to your workplace. The advantage of R/RStudio is that it is a free and very powerful statistical software. R is a software that requires some programming and understanding of computer languages but there are almost no limits of what you can do.
- Getting a graduate degree should go beyond simple Excel job training and should expose you to something new.
2.2.1 Preparation for R/RStudio
The next lecture will introduce you to the use of R and RStudio. We will use RStudio to interact with R and you can think of RStudio as a graphical user interface for R. To focus on the use of R and RStudio during the lecture, some easy preparatory steps are necessary for you to perform before class. Those are mostly related to installing R and RStudio and to load sample data into the software. With this document, you should have downloaded the small dataset honda.csv. In preparation for the lecture, you will load honda.csv into R and RStudio.
2.2.2 Installing R and RStudio
You must first install R on your computer by doing the following:
- Go to The R Project for Statistical Computing and download the R version that is appropriate for your computer. This is either the “base” version if you have a Windows computer or the “Latest release” .pkg file if you are using Mac OS. Once you have downloaded the file, install R on your computer.
- Go to RStudio and download the RStudio version that is appropriate for your computer. Note that the various “Installers for Supported Platforms” are at the bottom of the page. Once you have downloaded the file, install RStudio on your computer.
Note that we will only be using RStudio which runs R in the background. You cannot use RStudio without having R installed first. Throughout the lecture, I will be referring to R/RStudio.
2.2.3 Locating Files on your Computer
To import data into R/RStudio, you must know (1) where files are located on your computer and (2) what the current working directory of R/RStudio is. On a windows computer, the directory where files are stored is like C:\Users\Jerome\Documents\R Lecture and for Mac OS it is similar to /Users/Jerome/R Lecture.
Think of the working directory as the folder on your computer in which R/RStudio is looking for files by default. After opening R/RStudio, you can type getwd() in the console window and R/RStudio will return the current working directory. Usually, you have project specific working directories. For this class, create a directory on your computer in which you are going to store all the files associated with this class. You should download the dataset honda.csv into the directory you have created for this class. You can use the command setwd() to change the R/RStudio to the new working directory. Note - and this is an oddity with R/RStudio - you have to replace the backslash with a forward slash if you are a Windows user, i.e., use setwd("C:/Users/Jerome/Documents/R Lecture"). Assuming the honda.csv file is in the directory you have set above, use honda= read.csv("honda.csv") to load the file into R/RStudio. The data should appear in the Environment tab on the right side in R/RStudio. It is important that you can do the steps described above before the lecture to alleviate any issues at the beginning. A good video explaining the concept of file path can be found here.