Assignment 3: R Data Wrangling

Author

Matthew DeHaven

Due

February 12, 2024

Modified

January 13, 2025

1 Accept Assignment on Github Classroom

  1. Accept Assignment 3 on Github Classroom.

  2. Clone this assignment to your computer.

2 Restore the Package Environment

  1. Open up the project folder and launch an R terminal 1

  2. Restore the renv packages

1 You may get a message at this point that renv has started downloading and installing itself. If you get an error in this step, try to run renv::activate() in the terminal afterwards, and then restart the terminal.

  • Tell renv that “yes” you want to restore all of the packages. 2

2 If you tell it “no” by accident or don’t see a prompt, you can run renv::restore() in the terminal instead.

You now have the exact packages needed for this project (which is just tidyverse and data.table).

  1. In the R terminal, run renv::install("languageserver"); renv::install("jsonlite"). 3

3 We have to do this to link these packages to this new renv project library.

3 Data Wrangling

We are going to work with a real dataset downloaded from the Bureau of Labor Statistics’ “Job Openings and Labor Turnover Survey (JOLTS)”.

I have already placed the files you need in “raw-data”.

Your goal is to recreate the following summary table, using first tidyverse packages and then again using the data.table package. (Note, I rounded the values to 2 decimals places for displaying here).

Key: <state_code, state_text>
    state_code           state_text    HI    JO    LD    OS    QU    TS    UO
        <char>               <char> <num> <num> <num> <num> <num> <num> <num>
 1:         00             Total US  3.76  4.27  1.29  0.23  2.10  3.64  1.70
 2:         01              Alabama  3.96  4.21  1.22    NA  2.27  3.75  1.70
 3:         02               Alaska  5.66  5.75  2.10    NA  3.23  5.75  1.31
 4:         04              Arizona  4.24  4.62  1.29    NA  2.53  4.05  1.78
 5:         05             Arkansas  4.07  4.19  1.26    NA  2.39  3.91  1.56
 6:         06           California  3.38  3.99  1.24    NA  1.83  3.31  2.37
 7:         08             Colorado  4.27  4.54  1.44    NA  2.47  4.16  1.48
 8:         09          Connecticut  3.11  3.73  1.22    NA  1.53  2.98  2.10
 9:         10             Delaware  4.51  4.56  1.46    NA  2.55  4.29  1.35
10:         11 District of Columbia  2.72  3.61  0.92    NA  1.53  2.69  1.08
11:         12              Florida  3.86  4.27  1.28    NA  2.24  3.76  1.85
12:         13              Georgia  4.30  4.66  1.39    NA  2.51  4.18  1.84
13:         15               Hawaii  3.50  3.96  1.20    NA  1.97  3.39  1.39
14:         16                Idaho  4.61  4.68  1.48    NA  2.59  4.35  1.32
15:         17             Illinois  3.69  4.13  1.29    NA  1.96  3.47  2.05
16:         18              Indiana  4.08  4.13  1.43    NA  2.33  3.98  1.65
17:         19                 Iowa  3.51  4.14  1.19    NA  2.01  3.42  1.16
18:         20               Kansas  3.61  4.18  1.19    NA  2.11  3.55  1.29
19:         21             Kentucky  4.29  4.55  1.36    NA  2.49  4.11  1.59
20:         22            Louisiana  4.24  4.32  1.39    NA  2.53  4.20  1.64
21:         23                Maine  3.84  4.49  1.53    NA  1.94  3.76  1.34
22:         24             Maryland  3.47  4.56  1.23    NA  1.94  3.44  1.49
23:         25        Massachusetts  3.15  4.29  1.21    NA  1.54  2.95  1.47
24:         26             Michigan  3.77  4.37  1.38    NA  2.08  3.68  1.99
25:         27            Minnesota  3.33  4.32  1.14    NA  1.84  3.20  1.23
26:         28          Mississippi  4.05  4.23  1.31    NA  2.44  4.02  2.06
27:         29             Missouri  3.83  4.25  1.22    NA  2.26  3.71  1.51
28:         30              Montana  5.03  5.08  1.78    NA  2.84  4.93  1.09
29:         31             Nebraska  3.65  4.25  1.19    NA  2.06  3.50  0.86
30:         32               Nevada  4.43  4.52  1.64    NA  2.45  4.34  2.27
31:         33        New Hampshire  3.88  4.45  1.53    NA  1.96  3.77  1.12
32:         34           New Jersey  3.32  3.94  1.43    NA  1.62  3.30  2.09
33:         35           New Mexico  3.86  4.69  1.30    NA  2.26  3.83  1.65
34:         36             New York  3.01  3.62  1.24    NA  1.44  2.91  1.96
35:         37       North Carolina  4.29  4.62  1.39    NA  2.43  4.09  1.77
36:         38         North Dakota  4.30  4.58  1.65    NA  2.36  4.27  0.62
37:         39                 Ohio  3.68  4.28  1.28    NA  2.07  3.58  1.69
38:         40             Oklahoma  4.17  4.43  1.26    NA  2.47  4.00  1.17
39:         41               Oregon  3.96  4.38  1.41    NA  2.22  3.91  1.80
40:         42         Pennsylvania  3.35  4.30  1.35    NA  1.74  3.31  1.69
41:         44         Rhode Island  3.78  4.34  1.48    NA  1.85  3.59  2.03
42:         45       South Carolina  4.39  4.70  1.35    NA  2.51  4.12  1.61
43:         46         South Dakota  3.91  4.35  1.34    NA  2.19  3.77  0.88
44:         47            Tennessee  4.32  4.44  1.36    NA  2.50  4.10  1.58
45:         48                Texas  4.17  4.24  1.27    NA  2.51  4.00  1.54
46:         49                 Utah  4.29  4.35  1.35    NA  2.49  4.09  1.04
47:         50              Vermont  4.12  4.62  1.63    NA  2.05  4.03  1.01
48:         51             Virginia  3.71  4.71  1.19    NA  2.09  3.52  1.17
49:         53           Washington  3.51  4.06  1.22    NA  1.87  3.33  1.84
50:         54        West Virginia  4.34  4.91  1.41    NA  2.55  4.26  1.53
51:         55            Wisconsin  3.56  4.37  1.27    NA  1.96  3.46  1.37
52:         56              Wyoming  4.85  4.83  1.75    NA  2.84  4.89  1.07
53:         MW       Midwest region  3.74  4.31  1.30  0.23  2.09  3.62    NA
54:         NE     Northeast region  3.23  4.00  1.31  0.23  1.60  3.14    NA
55:         SO         South region  4.07  4.43  1.29  0.25  2.37  3.92    NA
56:         WE          West region  3.71  4.19  1.30  0.24  2.07  3.61    NA
    state_code           state_text    HI    JO    LD    OS    QU    TS    UO
Data Dictionary

HI - “Hires”, JO - “Job Openings”, LD - “Layoffs and Discharges”, OS - “Other Separations”, QU - “Quits”, TS - “Total Separations”, UO - “Unemployed Persons Per Job Opening Ratio”

3.1 Data Wrangling Steps

  1. Read in the following data files
  • “jt.series.txt”
  • “jt.state.txt”
  • “jt.data.0.Current.txt”
Tip

For data.table use fread(). For tidyverse use read_tsv(). The data is “tab” delimited.

  1. Merge the three datasets together by the correct columns.

  2. Filter the data by the following three conditions:

  • seasonal == "S"
  • industry_code == "000000"4
  • ratelevel_code == "R"

4 You’ll need to adjust this one slightly when using data.table.

This filters us to only seasonally adjusted data, for the “total” industry, and measured as a rate, not a level.

  1. Keep only the following columns in the data:
  • state_code, state_text, year, period, dataelement_code, value
  1. Summarize the data, calculating the average value by each…
  • state
  • data element
  1. Cast or pivot the data to wide so the data is tidy 5

  2. Save the output to disk with the name:

5 Each row is an observation, each column a variable.

  • “tidyverse_summary.csv” or
  • “datatable_summary.csv”

3.2 tidyverse

  1. Open up the R script, “data-wrangling-tidyverse.R”.

  2. Do the “Data Wrangling Steps” using only tidyverse functions.

3.3 data.table

  1. Open up the R script, “data-wrangling-datatable.R”.

  2. Do the “Data Wrangling Steps” using only data.table functions.

4 Sneak Peek: Plotting

Open up the “sneak-peek-plotting.R” file.

  1. Add the following code to the file:
library(ggplot2)
library(ggthemes)

data |>
  ggplot(aes(y = state_text, x = JO)) +
  geom_point() +
  theme_economist()
  1. In the terminal, run renv::install("ggthemes")

  2. Run the whole plotting file, you should get the following plot:

Looks like we should all move to Alaska if we want a job!

5 Snapshotting your environment

You just added a new package to the library, and used it in a script. This means that we should update the lockfile to reflect the new dependency on ggthemes. 6

6 ggplot2 comes with tidyverse, so it should already be in the lockfile.

  1. In the terminal, run renv::status() and note the discrepency.

  2. In the terminal, run renv::snapshot() and select “yes”.

This will update the lockfile.

  1. In the terminal, run renv::status() to make sure everything is up-to-date.

6 Submit to Github

If you haven’t already, commit everything and push to Github!