2 + 2[1] 4
Introduction to the R programming language
Matthew DeHaven
February 2, 2026
. . .
Why is it called R?
. . .
Created as a programming language for statistics and graphics.
Created as a programming language for statistics and graphics
. . .
. . .
IMO, easier to learn than Python or Julia
. . .
Taken from Economics and R Blog of Sebastian Kranz. Calculated from file extensions used in replication packages for published papers in economics.
Order of operations as expected
. . .
Modulo
Booleans are objects that are either TRUE or FALSE
They are very useful.
. . .
. . .
&|. . .
. . .
!Value Matching %in%
Compares if object on left is “in” the object on the right1
. . .
. . .
What is the : operator? “Sequence”
Computers handle decimals in odd ways!
? ? ?
. . .
Because computers use binary, cannot represent 0.1 exactly
Same as we cannot represent 1/3 in base 10 exactly
. . .
In R, we use a special “arrow” operator for assignment: <-
. . .
This lets us declare new variables or objects.
. . .
Why not use =?
Technically you can in R, but <- is preferred becauase = is used to assign values for a function call.
We have seen the “sequence” operator :
Which it turns out is just a shortcut for seq()
. . .
But the function gives more options
How do we know all of the options for a function?
Documentation!
. . .
All R functions have a help file explaining them, which can be accessed using
or
. . .
This documentation is also hosted online.
Never use the following:
. . .
All of this is important information!
But it shouldn’t be stored as a comment.
Instead we should use a task manager, variables declared at the top of a script, git for version control, etc.
You may be tempted to add this sort of comment
. . .
But what happens when you decide to change your code?
. . .
If someone else reads your code, do they trust
. . .
Code forces you to do exactly what you say (i.e. square, not cube). But comments do not, so they tend to get out of sync with the code.
“Good code does not need comments”
This is the goal.
Your code should be readable without any comments.
. . .
But that’s probably unrealisitc for most of us.
Some good rules:
. . .
These and more from: Best Practices for Writing Code Comments
I like to use comments to give sections to my code
I find this useful as a way to structure my code and make it more readable later on.
Character
Logical (“boolean”)
. . .
Integer
Numeric
. . .
Complex (imaginary numbers)
Raw (bytes)
. . .
Characters (a.k.a. “strings”) store text information
. . .
You can create a character variable with either '' or ""
We saw logical types before. Stored as TRUE or FALSE.
. . .
A special type of logical in R are missing values, stored as NA
. . .
Missing values always create more missing values
All whole numbers (no decimal): (…, -2, -1, 0, 1, 2, …)
An exact number storage, compared to the approximate “numeric” type.
. . .
To create an integer value, add an L at the end of the number
. . .
Can be useful for setting ID values,
but usually we will store numbers as “numeric” type instead.
Numeric is a class that stores numbers as floating point values.
. . .
In R, “double” is the only numeric type.
Equivalent to “float64” in other languages.
. . .
There used to be a “single” precision. Equivalent to “float32”.
. . .
R has a full set of as.___() functions for each type.
[1] "12"
[1] 12
[1] 0c
. . .
Sometimes the conversion is not as expected.
. . .
Sometimes will return missing.
Some languages are very strict about data types, R is not.
This is convenient, but somewhat dangerous.
. . .
paste() takes multiple strings and pastes them together
. . .
R will try to convert other types to a string to paste.
. . .
This also happens for math operations. Can be unexpected.
Vectors are an ordered set of values all of the same type.
They are created with the c() function (short for “concatenate”).
. . .
Technically, everything we have seen are vectors of length 1.
Vectors all have lengths.
. . .
Vectors can have names for each element.
NameOne NameTwo NameThree NameFour
"a" "d" "b" "z"
. . .
Vectors can have NA values, but otherwise, no mixing types.
Vector elements can be accessed by their position, or name,
or using square brackets[].
. . .
. . .
You can select multiple elements if you wish.
A lot of fuctions are “vectorized” to apply to each element.
. . .
. . .
. . .
Some functions instead take in a vector.
. . .
. . .
We will learn later how to vectorize any function.
Vectors are only one dimensional.
What if I need to store a mix of data types? - Use a list!
Each element has preserved its type!
We can again access list elements by their index position.
. . .
Note: x[3] returns a list with one element
. . .
Use double square brackets to return the element instead.
Lists can also have names for each element. We can assign them using names() or at construction.
. . .
$p
[1] "Providence"
$b
[1] "Boston"
$nyc
[1] "New York"
. . .
We can access list elements using the dollar sign $, followed by the element name.
We saw earlier that 1 element objects are actually vectors.
This means that we can have lists of multiple element vectors.
We can also have lists of lists!
$l1
$l1[[1]]
[1] 5 6 7
$l1[[2]]
[1] "A" "D" "E"
$l2
$l2[[1]]
[1] 1 2 3 4 5
$l2[[2]]
[1] TRUE
. . .
This can be as many list layers deep as you want.
Lists are much more general than vectors.
So why use vectors?
. . .
. . .
<simpleError in x^2: non-numeric argument to binary operator>
Think of them as “tables” of elements.
. . .
. . .
Behind the scenes, they are a
Data.frame values can be accesesd using index values:
x[row, col]
. . .
You can leave one index blank to get a whole row or column.
. . .
Or use the column names (remember, they are just lists).
The function str() will return information about the data structure of the passed object.
A specific type of vector.
. . .
Details to be covered in the problem set!
If statements evaluate a condition,
and then execute code if the condition is TRUE.
. . .
If we give another value for x…
. . .
Nothing is printed. Because print() was never run.
Sometimes you want to check a series of conditions,
[1] "X is a character."
. . .
This code,
elseTo catch any cases that do not pass any condition, you can use
[1] "I'm not sure what X is."
logi NA
. . .
If (and if-else) statements are the basics of controlling the flow of your program.
. . .
You can make sections of code that only execute for one dataset, or a robustness check that runs on only one model.
Loops are another key component for controlling your program flow.
Two basic loops are:
for(){}while(){}For loops execute code for a defined number of times.
. . .
. . .
The construction here is
While loops execute code repeatedly until a condition is met.
. . .
Here we emulated the function of the for loop from before.
. . .
But while loops only require one thing:
It is easy to write a while loop that will run forever.
. . .
This one is inane, but you can inadevertantly construct them.
While loops let you execute code for an unspecified duration.
If you are going to use a while loop, it’s a good idea to set a “safety” option to limit the maximum number of iterations.
. . .
There are some disadvantages to loops,
. . .
. . .
. . .
One alternative is to use one of the family of apply() functions.
lapply()We will see how to use the l + apply() function.
. . .
l stands for “list”, which is what the function returns.. . .
We can rewrite our prior for(){} loop as,
. . .
The construction is…
lapply()One nice thing about lapply() is it returns the values as list.
. . .
It also works with lists as the input,
Apply functions assume that each of your elements can be operated on separately.
. . .
For loops operate on each element sequentially.
. . .
. . .
lapply()lapply()
purposefully vague for now, we will talk about lists soon↩︎
Comments
Commenting code can be useful to yourself and others.
In R, commments are any line that begins with
#. . .
In VS Code
⌘k⌘c⌘k⌘u. . .