R Data Wrangling:
the tidyverse

Matthew DeHaven

Course Home Page

2024-01-01

Lecture Goals

Introduction to environments
Introduction to our first package: dplyr
Using the rest of the tidyverse

What is a package?

A package is

a set of functions
defined outside the current project (usually by someone else).

For R, most packages are hosted on CRAN (Comprehensive R Archive Network), which makes installing packages easy.

You can also easily install packages directly from github.

Packages are incredibly useful and will be a part of any project you work on.

Installing and Loading Packages

The base command for installing packages is

install.packages("nameOfPackage")

This will download the package from CRAN, and store it on your computer.

Installing and Loading Packages

If you then want to use a function from the package, you have two options:

Explicit reference for each function

nameOfPackage::some_function()

Loading the whole package

library(nameOfPackage)

some_function()

Neither of these options will work if the package isn’t installed!

Package Versions

Packages are functions managed by someone else.

Sometimes those people update the functions.

Great! Probably the new functions are better!

But sometimes this is catastrophic for you.

If your code relies entirely on v0.9 of some_function() which always returned a boolean,

and now v1.0 returns an integer,

all of your code using some_function() just broke.

Environments

Environments will let us be explicit about which package versions we are using.

Allow you to keep track of what packages you use, and their versions.
Load those specific versions of packages at any later date.

This makes them incredibly important for reproducibility.

R environments: renv

We will be using renv to handle all of our packages.

Environments

Whenver you start a project,

Install renv
Initialize the environment
Install the rest of your packages

Setting up `renv`

If you have never installed renv before,

install.packages(renv)

Then, we tell renv to start tracking packages for our project.

This is as simple as,

renv::init()

Setting up `renv`

renv::init()

To start with, the only packge being tracked is renv itself, and the version of R we are using.

Go ahead and restart the R Session. You should get a new message from renv at the start of the new session.

VS Code can’t find `jsonlite`, `rlang`

We also get a warning message at the start of the new session:

VSCode R Session Watcher requires jsonlite, rlang. Please install manually in order to use VSCode-R.

Why is this happening?

renv has created an isolated environment for our project.

The only package our system can see is renv itself!

All of the other packages we installed when setting up R and VS Code still exist on our machine, but they cannot be accessed.

Fixing VS Code `renv` issue

To fix this, we just need to tell renv to install the packages.

renv::install("languageserver")  ## Takes a few minutes the first time
renv::install("httpgd")

If you’ve used renv to install these packages before,

you should see the message “linked from cache”.

renv is smart enough to know it has installed these packages before and instead of redownloading them,

simply provides a link to the right version on our machine.

Reload the session. No more VS Code warning message!

`renv` Files

.Rprofile

This activates renv at the start of any R session.

renv/

Where packages for the isolated environment are installed.

renv.lock

A small file listing packages and their version numbers.

`renv` Files

renv.lock is the only piece needed to recreate your environment!

So renv/ doesn’t get commited to git, which is good, because it can be large.

The `tidyverse`

Introducing the `tidyverse`

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.¹

Developed in part by Hadley Wickham (a name we will see again) and supported by the company behind RStudio.

The most popular collection of R packages.

Specifically designed for readable data science.

Installing the `tidyverse`

renv::install("tidyverse")

Once it is installed for our environment, we can call,

library(tidyverse)

Which prints a nice message showing the packages loaded.

Some `tidyverse` packages

tibble

A replacement for data.frame object, with better printing

dplyr

Data science functions for adding new columns, filtering data, summarizing, etc.

lubridate

Makes it easier to work with dates.

stringr

Better regular expression and string matching.

readr

Import and export functions (for CSVs)

`tibble` printing

data.frame(
  x = 1:10,
  y = rnorm(10)
)

    x           y
1   1  0.09589645
2   2  1.58105376
3   3 -0.44869387
4   4  1.14401658
5   5 -0.21688373
6   6 -0.33656141
7   7  0.50193575
8   8 -0.76144252
9   9  2.16781377
10 10 -1.98231945

tibble(
  x = 1:25,
  y = rnorm(25)
)

# A tibble: 25 × 2
       x        y
   <int>    <dbl>
 1     1  1.42   
 2     2  0.180  
 3     3  0.838  
 4     4 -0.0936 
 5     5  1.63   
 6     6  0.00293
 7     7  0.633  
 8     8  0.369  
 9     9 -0.383  
10    10  0.667  
# ℹ 15 more rows

Shows variable types, dimensions.

Limits printing to not flood console.

`tibble` list columns

A powerful feature of the tibble type is being able to store any object in a tibble column.

tibble(
  x = c(1, 2),
  example = list(
    c(a = 5, b = 5),
    c(a = 4, b = 10)
  )
)

# A tibble: 2 × 2
      x example  
  <dbl> <list>   
1     1 <dbl [2]>
2     2 <dbl [2]>

`tibble` list columns

This means you can have a tibble with

a column of data sets (nested tibbles)
a column of fitted models
a column of model summary stats (R^2, etc.)

`dplyr`

A set of verbs to work with tibbles.

We will work with a built in dataset: mtcars.

tb <- as_tibble(mtcars)
tb

# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ 22 more rows

You can see a list of the built in datasets by calling data().

`dplyr` filter

Filtering datasets allows you to select rows based off of a logical condition.

The first argument is the tibble, the second the condition.

filter(tb, cyl == 6)

# A tibble: 7 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
4  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
5  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
6  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
7  19.7     6  145    175  3.62  2.77  15.5     0     1     5     6

`dplyr` filter

You can filter on multiple conditions using the & operator.

filter(tb, cyl == 6 & gear == 5)

# A tibble: 1 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  19.7     6   145   175  3.62  2.77  15.5     0     1     5     6

Piping

dplyr verbs are designed to be used with a pipe.

A pipe in programming takes the output from the left, and plugs it in to the first argument on the right.

In R, the pipe operation is: |>

tb |> filter(cyl == 6)

# A tibble: 7 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
4  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
5  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
6  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
7  19.7     6  145    175  3.62  2.77  15.5     0     1     5     6

You may also see %>% used as a pipe. This is from the magrittr package, and was used before R had a base pipe.

Chaining pipes

tb |> filter(cyl == 6) |> filter(gear == 5)

# A tibble: 1 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  19.7     6   145   175  3.62  2.77  15.5     0     1     5     6

This is even more readable with a return after each pipe.

tb |>
  filter(cyl == 6) |>
  filter(gear == 5)

The idea behind piping is that you can read them from

left-to-right, or top-to-down, which is more natural.

Rather than usual function calls, which are read inside-to-out.

filter(filter(tb, cyl == 6), gear == 5)

`dplyr` mutate

Possibly the most useful data science verb.

Mutate lets you create new columns in your tibble.

tb |>
  mutate(cyl_sq = cyl ^ 2)

# A tibble: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb cyl_sq
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     36
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     36
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     16
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     36
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     64
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     36
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4     64
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     16
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     16
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     36
# ℹ 22 more rows

`dplyr` multiple mutates

If you want to do multiple mutates, you can either

add another mutate function
add another declaration separated by a comma

tb |>
  mutate(cyl_sq = cyl ^ 2) |>
  mutate(cyl_cb = cyl ^ 3)

tb |>
  mutate(
    cyl_sq = cyl ^ 2,
    cyl_cb = cyl ^ 3
  )

`dplyr` mutate functions

Mutate works with any function that takes a returns a vector of the same length.

tb |>
  mutate(cyl_sqrt = sqrt(cyl))

# A tibble: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb cyl_sqrt
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     2.45
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     2.45
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     2   
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     2.45
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     2.83
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     2.45
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4     2.83
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     2   
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     2   
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     2.45
# ℹ 22 more rows

This is very useful, as it can work with custom functions you write as well.

`dplyr` select

We may not need all of the columns of our data.

select() lets you choose which columns to keep.

tb |>
  mutate(cyl_sq = cyl ^ 2) |>
  select(mpg, cyl, cyl_sq)

# A tibble: 32 × 3
     mpg   cyl cyl_sq
   <dbl> <dbl>  <dbl>
 1  21       6     36
 2  21       6     36
 3  22.8     4     16
 4  21.4     6     36
 5  18.7     8     64
 6  18.1     6     36
 7  14.3     8     64
 8  24.4     4     16
 9  22.8     4     16
10  19.2     6     36
# ℹ 22 more rows

`dplyr` summarize

Summarize lets you collapse your data using functions that take a vector and return a single value.

tb |>
  summarize(mpg_avg = mean(mpg))

# A tibble: 1 × 1
  mpg_avg
    <dbl>
1    20.1

We could summarize multiple columns or operations at once.

tb |>
  summarize(
    mpg_avg = mean(mpg),
    mpg_max = max(mpg),
    mpg_min = min(mpg),
    n = n()
    )

# A tibble: 1 × 4
  mpg_avg mpg_max mpg_min     n
    <dbl>   <dbl>   <dbl> <int>
1    20.1    33.9    10.4    32

`dplyr` group_by

What if we want summaries for each quantity of cylinders (cyl)?

We could filter the data first, then summarise, and repeat for each cylinder value…

… or we tell dplyr to group the data by cyl.

tb |>
  group_by(cyl) |>
  summarise(
    mpg_avg = mean(mpg),
    mpg_max = max(mpg),
    mpg_min = min(mpg),
    n = n()
    )

# A tibble: 3 × 5
    cyl mpg_avg mpg_max mpg_min     n
  <dbl>   <dbl>   <dbl>   <dbl> <int>
1     4    26.7    33.9    21.4    11
2     6    19.7    21.4    17.8     7
3     8    15.1    19.2    10.4    14

`dplyr` main verbs

filter()
mutate()
select()
summarize()
group_by

the pipe: |>

With all of these, you can do most data wrangling!

Other useful `dplyr` verbs

arrange() sort your data by a column

distinct() keep only unique rows

rename() rename a column

Non-tibble functions

if_else() vectorized if-else, useful within mutates

case_when() use instead of nested if_else statements

lag() and lead() shift values forward or back

dplyr documentation

Merging

`dplyr` _join functions

dplyr has a family of functions for merging two datasets.

Here are two example data.frames that we will use to understand the differences.

tb1 <- tibble(
  id = c("A", "B", "C", "D", "E"),
  x = c(4, 9, 2, 3, 8)
)

tb2 <- tibble(
  id = c("D", "E", "F", "G", "H"),
  y = c(7, 4, 1, 2, 6)
)

All of the join functions have the same basic structure.

[]_join(x, y, by = ___)

Here are all of the options:

full_join(tb1, tb2, by = "id")
left_join(tb1, tb2, by = "id")
right_join(tb1, tb2, by = "id")
inner_join(tb1, tb2, by = "id")
semi_join(tb1, tb2, by = "id")
anti_join(tb1, tb2, by = "id")

`dplyr` full_join()

Keeps all observations from both the x (left) and y (right) datasets.

full_join(tb1, tb2, by = "id")

# A tibble: 8 × 3
  id        x     y
  <chr> <dbl> <dbl>
1 A         4    NA
2 B         9    NA
3 C         2    NA
4 D         3     7
5 E         8     4
6 F        NA     1
7 G        NA     2
8 H        NA     6

`dplyr` left_join()

Keeps all observations from the x (left) dataset,

but only the matches from the y (right) dataset.

left_join(tb1, tb2, by = "id")

# A tibble: 5 × 3
  id        x     y
  <chr> <dbl> <dbl>
1 A         4    NA
2 B         9    NA
3 C         2    NA
4 D         3     7
5 E         8     4

`dplyr` right_join()

Keeps all observations from the y (right) dataset,

but only the matches from the x (left) dataset.

right_join(tb1, tb2, by = "id")

# A tibble: 5 × 3
  id        x     y
  <chr> <dbl> <dbl>
1 D         3     7
2 E         8     4
3 F        NA     1
4 G        NA     2
5 H        NA     6

`dplyr` inner_join()

Keeps only matches between the two datasets.

inner_join(tb1, tb2, by = "id")

# A tibble: 2 × 3
  id        x     y
  <chr> <dbl> <dbl>
1 D         3     7
2 E         8     4

`dplyr` filtering joins

semi_join()

Keep observations in x (left) with a possible match in y (right).

semi_join(tb1, tb2, by = "id")

# A tibble: 2 × 2
  id        x
  <chr> <dbl>
1 D         3
2 E         8

Note: only observations in x are returned, no values from y.

`dplyr` filtering joins

anti_join()

Keep observations in x (left) without a possible match in y (right).

anti_join(tb1, tb2, by = "id")

# A tibble: 3 × 2
  id        x
  <chr> <dbl>
1 A         4
2 B         9
3 C         2

Note: only observations in x are returned, no values from y.

Merging by different column names

Often you will have slightly different identifying column names between two datasets.

tb1 <- tibble(
  id = c("A", "B", "C", "D", "E"),
  x = c(4, 9, 2, 3, 8)
)

tb2 <- tibble(
  ID = c("D", "E", "F", "G", "H"),
  y = c(7, 4, 1, 2, 6)
)

We can handle this using join_by() in the “by” argument.

inner_join(tb1, tb2, by = join_by(id == ID))

# A tibble: 2 × 3
  id        x     y
  <chr> <dbl> <dbl>
1 D         3     7
2 E         8     4

With the join_by() function you can do inequality, rolling, or overlapping joins (see the documentation).

tidy Data

“tidy” data

Each variable is a column
Each observation is a row

This sounds almost tautological.

But a lot of data is not stored this way.

Which of the following tables of stock prices is tidy?

# A tibble: 3 × 3
  stock `2024-01-01` `2024-01-02`
  <chr>        <dbl>        <dbl>
1 A              100          105
2 B               98           95
3 C               99          103

# A tibble: 6 × 3
  stock date       price
  <chr> <date>     <dbl>
1 A     2024-01-01   100
2 A     2024-01-02   105
3 B     2024-01-01    98
4 B     2024-01-02    95
5 C     2024-01-01    99
6 C     2024-01-02   103

Why tidy?

Imagine we want to add another variable—volume—to the stock price tables.

# A tibble: 6 × 3
  stock   `2024-01-01` `2024-01-02`
  <chr>          <dbl>        <dbl>
1 A_price          100          105
2 B_price           98           95
3 C_price           99          103
4 A_vol             50           70
5 B_vol             30           55
6 C_vol             23           60

# A tibble: 6 × 4
  stock date       price volume
  <chr> <date>     <dbl>  <dbl>
1 A     2024-01-01   100     50
2 A     2024-01-02   105     70
3 B     2024-01-01    98     30
4 B     2024-01-02    95     55
5 C     2024-01-01    99     23
6 C     2024-01-02   103     60

This is very easy for tidy data.

But it is hard for nontidy data.

Do we add a “_price” and “_vol” to each stock row?
Or to each date column?
Or store them as two tables?

Making everything tidy

The tidyverse obviously has strong support for tidy data.

In particular, the package tidyr.

We will cover two crucial functions:

tidyr::pivot_longer()
tidyr::pivot_wider()

`pivot_longer`

Pivot functions let you transform a data.frame between wide and long formats.

Let’s look at the stock price data from earlier.

`pivot_longer`

nontidy_tb

# A tibble: 3 × 3
  stock `2024-01-01` `2024-01-02`
  <chr>        <dbl>        <dbl>
1 A              100          105
2 B               98           95
3 C               99          103

Main argument is “cols =”, which takes the columns to lengthen.

nontidy_tb |>
  pivot_longer(cols = c("2024-01-01", "2024-01-02"))

# A tibble: 6 × 3
  stock name       value
  <chr> <chr>      <dbl>
1 A     2024-01-01   100
2 A     2024-01-02   105
3 B     2024-01-01    98
4 B     2024-01-02    95
5 C     2024-01-01    99
6 C     2024-01-02   103

`pivot_longer`

nontidy_tb

# A tibble: 3 × 3
  stock `2024-01-01` `2024-01-02`
  <chr>        <dbl>        <dbl>
1 A              100          105
2 B               98           95
3 C               99          103

Instead, we could list the columns not to lengthen.

nontidy_tb |>
  pivot_longer(cols = -c("stock"))

# A tibble: 6 × 3
  stock name       value
  <chr> <chr>      <dbl>
1 A     2024-01-01   100
2 A     2024-01-02   105
3 B     2024-01-01    98
4 B     2024-01-02    95
5 C     2024-01-01    99
6 C     2024-01-02   103

`pivot_longer` names

pivot_longer defaults to naming new columns “name” and “value”, but you can override these.

nontidy_tb |>
  pivot_longer(cols = -c("stock"), names_to = "date", values_to = "price")

# A tibble: 6 × 3
  stock date       price
  <chr> <chr>      <dbl>
1 A     2024-01-01   100
2 A     2024-01-02   105
3 B     2024-01-01    98
4 B     2024-01-02    95
5 C     2024-01-01    99
6 C     2024-01-02   103

`pivot_longer` data types

We should also convert the date column to the date-type.

nontidy_tb |>
  pivot_longer(cols = -c("stock"), names_to = "date", values_to = "price") |>
  mutate(date = as.Date(date))

# A tibble: 6 × 3
  stock date       price
  <chr> <date>     <dbl>
1 A     2024-01-01   100
2 A     2024-01-02   105
3 B     2024-01-01    98
4 B     2024-01-02    95
5 C     2024-01-01    99
6 C     2024-01-02   103

`pivot_wider`

Sometimes, you will want your data in wide format.

While this isn’t always “tidy” it can be useful for some calculations, and then you can go back to long.

`pivot_wider`

tidy_tb

# A tibble: 6 × 3
  stock date       price
  <chr> <date>     <dbl>
1 A     2024-01-01   100
2 A     2024-01-02   105
3 B     2024-01-01    98
4 B     2024-01-02    95
5 C     2024-01-01    99
6 C     2024-01-02   103

The two key arguments: “names_from =” and “values_from =”.

tidy_tb |>
  pivot_wider(names_from = "stock", values_from = "price")

# A tibble: 2 × 4
  date           A     B     C
  <date>     <dbl> <dbl> <dbl>
1 2024-01-01   100    98    99
2 2024-01-02   105    95   103

Pivoting

With

pivot_longer
pivot_wider

it’s easy to convert from long to wide data foramts.

With these tools, you can “tidy” most any dataset.

Once data is “tidy” it’s easy to lengthen or widen the data for any calculations.

Summary

renv for managing packages and versions
The tidyverse for easy data science
- tibbles as better data.frames
- dplyr for working with tibbles
- |> for piping
- tidyr for pivoting data long or wide

Live Session

Practice reading in a csv
Tidying it with dplyr and tidyr
Creating summary tables

R Data Wrangling:the tidyverse

Lecture Goals

What is a package?

Installing and Loading Packages

Installing and Loading Packages

Package Versions

Environments

Environments

Setting up renv

Setting up renv

VS Code can’t find jsonlite, rlang

Fixing VS Code renv issue

renv Files

renv Files

The tidyverse

Introducing the tidyverse

Installing the tidyverse

Some tidyverse packages

tibble printing

tibble list columns

tibble list columns

dplyr

dplyr filter

dplyr filter

Piping

Chaining pipes

dplyr mutate

dplyr multiple mutates

dplyr mutate functions

dplyr select

dplyr summarize

dplyr group_by

dplyr main verbs

Other useful dplyr verbs

Merging

dplyr _join functions

dplyr full_join()

dplyr left_join()

dplyr right_join()

dplyr inner_join()

dplyr filtering joins

dplyr filtering joins

Merging by different column names

tidy Data

“tidy” data

Why tidy?

Making everything tidy

pivot_longer

pivot_longer

pivot_longer

pivot_longer names

pivot_longer data types

pivot_wider

pivot_wider

Pivoting

Summary

Live Session

R Data Wrangling:
the tidyverse

Setting up `renv`

Setting up `renv`

VS Code can’t find `jsonlite`, `rlang`

Fixing VS Code `renv` issue

`renv` Files

`renv` Files

The `tidyverse`

Introducing the `tidyverse`

Installing the `tidyverse`

Some `tidyverse` packages

`tibble` printing

`tibble` list columns

`tibble` list columns

`dplyr`

`dplyr` filter

`dplyr` filter

`dplyr` mutate

`dplyr` multiple mutates

`dplyr` mutate functions

`dplyr` select

`dplyr` summarize

`dplyr` group_by

`dplyr` main verbs

Other useful `dplyr` verbs

`dplyr` _join functions

`dplyr` full_join()

`dplyr` left_join()

`dplyr` right_join()

`dplyr` inner_join()

`dplyr` filtering joins

`dplyr` filtering joins

`pivot_longer`

`pivot_longer`

`pivot_longer`

`pivot_longer` names

`pivot_longer` data types

`pivot_wider`

`pivot_wider`