R
This lab introduces R
and R Studio
as well
as a few procedures we’ll use in future class sessions.
Directions (read before starting)
\(~\)
Most labs will begin with a short “onboarding” section that we’ll cover as a class. After that, you’re expected to work through the lab with your group following the “paired programming” paradigm. This framework entails:
R
.R
open to try out ideas and record your group’s final
responses.You should aim to switch roles on a regular basis, with less experienced coders spending more time in the “driver” role.
\(~\)
When you first open RStudio
you’ll need to place to
write and store your code. The simplest solution is to create a new “R
Script” via:
File -> New File -> R Script
This opens a blank page in the upper-left of the RStudio
interface. At this point you should see four panels:
An R Script is like a text file that stores your code while you work on it. You can execute some or all of your code by sending it to the console in the following ways:
If you highlight a segment of code before running, only the highlighted code will be sent to the console for execution. Try this out by running the following code:
log2(4)
## [1] 2
You should notice that the Console echoes any code you run, and beneath the echo it prints any textual/numeric output that the code produces.
When code you submit to the console cannot be executed due to errors you will receive a red-colored message describing the problem. For example, try running the following code:
log2(x) # Produces an error since 'x' hasn't been defined
Next, the Environment (upper-right) shows information on any objects that have been loaded into your work session.
To illustrate the features of this panel, try running the following code:
college_majors = read.csv("https://remiller1450.github.io/data/majors.csv")
This uses the function read.csv()
to load data from the
file “majors.csv” that is housed at the URL provided in quotations. This
data set is stored in R
as an object named
college_majors
, which you should see in the
Environment.
college_majors
will open a
viewer page, allowing you to inspect the data as if it were a
spreadsheet.college_majors
will
display a list of variable names contained in the data set.Finally, the Files/Plots/Help Viewer (lower-right) will display graphics generated by your code, help documentation, and a file explorer tree.
To demonstrate this panel, try running the following code:
?sum
This opens the help documentation for the function/object name given
after the question mark (the sum()
function in this
example). If you encounter an error when working a function you should
try to read its help documentation.
\(~\)
At this point you should begin working independently with your assigned partner(s) using a paired programming framework. Remember that you should read the lab’s content, not just the questions, and you should all agree with an answer before moving on. Despite being graded, labs aren’t intended to be formal assessments and you can ask for questions
In R
, objects are nameable structures used to
store data. Data are assigned into an object using either
<-
or =
. After assignment, an object’s name
is used to reference the data it holds.
The simplest type of data storage object is a single-element vector, which is sometimes called a “scalar”:
x = 5 # assigns the value '5' to an object named 'x'
We can now use the name x
to reference this stored
value:
x^2 # squares the value stored in 'x'
## [1] 25
More generally, we’ll encounter vectors containing multiple elements:
y = 1:3 # the sequence {1, 2, 3} is assigned to the vector called 'x'
print(y)
## [1] 1 2 3
This motivates us to understand indices, or the positions
where specific pieces of data, known as elements or atomic
units, are located within an object. In R
, vectors
have a single index that begins counting at 1 (many other programming
languages begin counting at 0).
The example below uses square brackets ([
and
]
) to access the second element of y
, which is
the number 2
. This element is then stored as a new object
named y2
, which is printed using the print()
function:
y2 = y[2] # access the element in position 2 and store as y2
print(y2) # print to confirm we extracted "2"
## [1] 2
The most common data structure in R
is the data
frame, which is a collection of one or more named vectors that each
contain the same number of elements. While we’ll primarily work with
data frames created by the read.csv()
function, there are
some valuable insights in creating one ourselves.
The example below takes two vectors, x1
and
x2
, and assembles them into a data frame object named
df
. You should notice how x1
and
x2
are given the names “ID” and “value” when the data frame
is created.
x1 = c(209, 230, 310) # First vector
x2 = c(5, 6, 8) # Second vector
df = data.frame(ID = x1, value = x2) # Put together into a data frame, giving each a name
print(df)
## ID value
## 1 209 5
## 2 230 6
## 3 310 8
A useful property of data frames is that the individual vectors they
contain can be accessed using $
operator. The code
demonstrates this by accessing the vector named “value” in the data
frame df
then using the median()
function to
find its median (the midpoint of values when they’re put in ascending
order):
median(df$value)
## [1] 6
We can also use indices to access data stored in df
, but
we’ll need recognize that data frames use two dimensions (rows and
columns) to organize their elements. Consider the following
examples:
## Access a single element
df[2,1]
## [1] 230
## Access an entire column
df[ ,1]
## [1] 209 230 310
## Access and entire row
df[2, ]
## ID value
## 2 230 6
Question #2: For the parts that follow you should
use the happy_planet_data
data frame you previously created
in Question 1.
\(~\)
Vectors can store values of various different types. The example
below creates a vector, z
, whose type is “character”:
z = c("X", "Y", "Z")
Notice that this vector was created using the c()
function to concatenate three different character strings that
are separated by commas. We will frequently use the c()
function to create our own vectors at various points throughout the
semester.
There are many types of vectors in R
, but there are only
3 that you need to be aware of for this course:
x = c(1,2,3)
x = c("A","B","C")
x = c(TRUE, FALSE, TRUE)
The class()
function provides information about an
object and it can be used to check the type of a vector:
class(z)
## [1] "character"
It is important to be familiar with different data types because many functions only work as intended when they receive inputs of the proper type.
For example, the mean()
function will throw an error
when given an input that isn’t a numeric type:
mean(z) # Recall that z is character, so it has no average value
## Warning in mean.default(z): argument is not numeric or logical: returning NA
## [1] NA
Sometimes we can successfully coerce an object into a different type so that it can be properly handled by functions that expect a certain type. Below are a few examples of coercion:
x = c("10", "20") # Example vector
class(x) # Notice that it's a character vector, so mean() doesn't work
## [1] "character"
new_x = as.numeric(x) # Coerce x to be numeric using as.numeric(), then store as new_x
mean(new_x) # This now works!
## [1] 15
Question #3: The as.character()
function will coerce a numeric or logical type into a character type.
For example, 1
will be coerced into "1"
. Using
an R
comment, briefly describe why this might be desirable
for one of the variables contained in happy_planet_data
,
the data frame you’ve worked with in Questions 1 and 2.
\(~\)
At this point we’re ready to cover a few commonly used functions that
help us work with data structures in R
. Below are a few
functions that provide the dimensions of data objects:
## Example data frame
college_majors = read.csv("https://remiller1450.github.io/data/majors.csv")
## Find the number of elements in a vector (recall that df$ID is the vector ID in df)
length(college_majors$Major)
## Find the number of rows (cases) in a data frame
nrow(college_majors)
## Find the number of columns (variables) in a data frame
ncol(college_majors)
## Find the dimensions (rows, columns) of a data frame
dim(college_majors)
These functions are useful in finding the number of cases and variables in an object so that other quantities can be computed. Soon we’ll need to calculate proportions, which use the number of cases/elements as a denominator. It’s preferable to have code that finds and stores this denominator rather than using a magic number you got from the Environment panel or your own knowledge of the data.
\(~\)
Question #4: For this question you will use the data
stored at the URL:
https://remiller1450.github.io/data/congress_2024.csv
which
contains information on the 118th
US Congress
R
comment, briefly
describe what a “case” is for this data set.
Comments
Computer code never lives in isolation. You should always prepare your code to be read and reused by your future self and others. Annotating your code with comments, or text that appears alongside the code but is not executed by the console, is an important aspect of coding.
In
R
, the character “#” is used to start a comment. Everything appearing on the same line to the right of the “#” will not be executed when that line is submitted to the console.Question #1:
happy_planet_data = read.csv("https://remiller1450.github.io/data/happy_planet_2025.csv")
R
comment.\(~\)