Testing our code in a systematic way
January 13, 2025
Unit tests…
Unit tests are a programming methodology/framework where each test runs on the smallest possible portion of the code, so the errors tell you exactly where something went wrong.
Remember back to the start of the course, we motivated writing programming scripts as a replacement for point-and-click software?
Unit tests take this one step farther.
I want to emphasize this,
When you find yourself testing for the 3rd time by hand if one of you merges worked, you should consider writing a unit test for it instead.
Unit tests are most often used for packages.
But I think unit tests are incredibly useful for research projects as well:
You can then easily rerun all of your tests whenever you update your raw data, or change a step in the analysis.
This is a preview of the next lecture.
Github can run unit tests
If one of your tests doesn’t pass, it sends you an email.
This is an example of what programmers call “Continuous Integration”, where your tests run as you develop the code.
In R, the best package for unit tests is testthat
created by—you guessed it–Hadley Wickham.
Notice that we set the “edition” of the package after loading it. This is because so many packages relied upon testthat
edition 2 they couldn’t deprecate all the functions they wanted to change, so they made an edition 3 (which is what we will use).
The basic element of testthat
unit tests are the expect_
family of functions.
Which didn’t return anything. It only returns something on an error.
And this is the goal of unit tests, throw a helpful error when the result isn’t expected.
A couple of very useful expectations are
These work with logical conditions, which make it easy to write your own expectations.
When you do this the error messages are less helpful, so it’s better to use a pre-built expect_()
function if you can.
If you want to check if two numbers are equal, you can use,
If you want to check if two numbers are exactly equal, you use
It can be useful to expect a certain data type.
Error: "hello" has type 'character', not 'double'.
Sometimes you will want to expect an error.
Let’s write a very basic function and some tests for it.
And some things we could test:
And for a test that does not pass:
We’d expect an error when given vectors of different length, but R tries to fix this for us and duplicates values to make them the same length and just throws a warning.
We now have a group of expectations we would like to run for our function.
Let’s make our first “unit” test.
test_that("rmse works for various vectors", {
expect_equal(rmse(c(1,2,3), c(2,2,2)), 0.816, tolerance = 0.01)
expect_equal(rmse(c(0,0,0), c(10,10,10)), 10)
expect_true(is.na(rmse(c(1,2,NA), c(1,2,3))))
Test passed
And we passed!
Let's go ahead and add our expectation that failed.
test_that("rmse works for various vectors", {
expect_equal(rmse(c(1,2,3), c(2,2,2)), 0.816, tolerance = 0.01)
expect_equal(rmse(c(0,0,0), c(10,10,10)), 10)
expect_true(is.na(rmse(c(1,2,NA), c(1,2,3))))
expect_error(rmse(c(1,2), c(1,2,3)))
-- Warning: rmse works for various vectors -------------------------------------
longer object length is not a multiple of shorter object length
1. +-testthat::expect_error(rmse(c(1, 2), c(1, 2, 3)))
2. | \-testthat:::quasi_capture(...) at testthat/R/expect-condition.R:126:5
3. | +-testthat (local) .capture(...) at testthat/R/quasi-label.R:54:3
4. | | \-base::withCallingHandlers(...) at testthat/R/deprec-condition.R:23:5
5. | \-rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo)) at testthat/R/quasi-label.R:54:3
6. \-global rmse(c(1, 2), c(1, 2, 3)) at rlang/R/eval.R:96:3
7. \-base::mean((actual - predicted)^2)
-- Failure: rmse works for various vectors -------------------------------------
`rmse(c(1, 2), c(1, 2, 3))` did not throw an error.
! Test failed
Next, we will try to pass this test.
Let’s fix our rmse()
function to throw an error for mismatched vectors.
Now we can rerun our test.
OverviewWe have seen how to write individual unit tests using testthat
package. - Each unit test consists of one or more expect_()
Now we will look at two ways to store and run all of our unit tests in…
We will look at a package first, as this is the most natural location for unit tests.
We saw last week how you can use the tools from devtools
and usethis
to quickly create an R package.
For instance, you can create a package in the current folder using,
Now, if we want to use testthat
for our package, simply run
✔ Adding 'testthat' to Suggests field in DESCRIPTION
✔ Adding '3' to Config/testthat/edition
✔ Creating 'tests/testthat/'
✔ Writing 'tests/testthat.R'
• Call `use_test()` to initialize a basic test file and open it for editing.
What files and folders were added to our package?
testthat as a suggested package and sets edition number to 3
Let’s add the rmse()
function we wrote earlier to the “R/” folder:
Now that we have a function we can run the folloiwng in our terminal:
✔ Writing 'tests/testthat/test-rmse.R'
• Modify 'tests/testthat/test-rmse.R'
This creates a new test file “test-rmse.r” and opens it for us to modify.
The test file comes with a basic test for us to edit.
We want to change
so we can test our function rmse()
Let’s reuse the test we wrote earlier.
The only difference from earlier is we have now saved our function in one file, our test in another, and both are part of an R pacakge.
You now have a few options to run your package tests from the terminal.
For writing a package, options 1 and 2 are most useful.
Let’s see what the output of each looks like.
Output
ℹ Updating prepUnitTests documentation
ℹ Loading prepUnitTests
══ Building ═══════════════════════════════════════════════════════════════════════════════
Setting env vars:
• CFLAGS : -Wall -pedantic -fdiagnostics-color=always
• CXXFLAGS : -Wall -pedantic -fdiagnostics-color=always
• CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
• CXX14FLAGS: -Wall -pedantic -fdiagnostics-color=always
• CXX17FLAGS: -Wall -pedantic -fdiagnostics-color=always
• CXX20FLAGS: -Wall -pedantic -fdiagnostics-color=always
── R CMD build ────────────────────────────────────────────────────────────────────────────
checking for file ‘/Users/matthewdehaven/Research/Courses/course-applied-economics-analy✔ checking for file ‘/Users/matthewdehaven/Research/Courses/course-applied-economics-analysis-templates/prepUnitTests/DESCRIPTION’
─ preparing ‘prepUnitTests’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
Removed empty directory ‘prepUnitTests/man’
─ building ‘prepUnitTests_0.0.0.9000.tar.gz’
══ Checking ═══════════════════════════════════════════════════════════════════════════════
Setting env vars:
• NOT_CRAN : true
── R CMD check ────────────────────────────────────────────────────────────────────────────
─ using log directory ‘/private/var/folders/wp/_szmdb513bxd6dqzmkgl5zrc0000gn/T/Rtmpc9QN1o/file2a624d0b73ee/prepUnitTests.Rcheck’
─ using R version 4.3.2 (2023-10-31)
─ using platform: aarch64-apple-darwin23.0.0 (64-bit)
─ R was compiled by
Apple clang version 15.0.0 (clang-1500.0.40.1)
GNU Fortran (Homebrew GCC 13.2.0) 13.2.0
─ running under: macOS Sonoma 14.2.1
─ using session charset: UTF-8
─ using options ‘--no-manual --as-cran’
✔ checking for file ‘prepUnitTests/DESCRIPTION’ ...
─ this is package ‘prepUnitTests’ version ‘’
─ package encoding: UTF-8
✔ checking package namespace information ...
✔ checking package dependencies (2s)
✔ checking if this is a source package ...
✔ checking if there is a namespace
✔ checking for executable files ...
✔ checking for hidden files and directories
✔ checking for portable file names
✔ checking for sufficient/correct file permissions ...
✔ checking serialization versions
✔ checking whether package ‘prepUnitTests’ can be installed (601ms)
✔ checking installed package size ...
✔ checking package directory ...
✔ checking for future file timestamps ...
✔ checking DESCRIPTION meta-information ...
✔ checking top-level files ...
✔ checking for left-over files
✔ checking index information
✔ checking package subdirectories ...
✔ checking R files for non-ASCII characters ...
✔ checking R files for syntax errors ...
✔ checking whether the package can be loaded ...
✔ checking whether the package can be loaded with stated dependencies ...
✔ checking whether the package can be unloaded cleanly ...
✔ checking whether the namespace can be loaded with stated dependencies ...
✔ checking whether the namespace can be unloaded cleanly ...
✔ checking loading without being on the library search path ...
✔ checking dependencies in R code ...
✔ checking S3 generic/method consistency ...
✔ checking replacement functions ...
✔ checking foreign function calls ...
✔ checking R code for possible problems (1.1s)
✔ checking for missing documentation entries ...
─ checking examples ... NONE
✔ checking for unstated dependencies in ‘tests’ ...
─ checking tests ...
✔ Running ‘testthat.R’ (351ms)
✔ checking for non-standard things in the check directory
✔ checking for detritus in the temp directory
── R CMD check results ────────────────────────────────────── prepUnitTests ────
Duration: 5.9s
0 errors ✔ | 0 warnings ✔ | 0 notes ✔
Output
✔ | F W S OK | Context
✔ | 4 | rmse
══ Results ═════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 4 ]
Which gives us a nice formatting showing our successes.
will also include this summary if you do have any failing tests.
Let’s add another test to our rmse()
test_that("rmse works for various vectors", {
expect_equal(rmse(c(1,2,3), c(2,2,2)), 0.816, tolerance = 0.01)
expect_equal(rmse(c(0,0,0), c(10,10,10)), 10)
expect_true(is.na(rmse(c(1,2,NA), c(1,2,3))))
expect_error(rmse(c(1,2), c(1,2,3)))
test_that("rmse works for a fitted model", {
m <- lm(Petal.Length ~ Petal.Width, data = iris)
fit <- fitted(m)
act <- iris$Petal.Length
x <- rmse(fit, act)
expect_type(x, "character")
expect_lt(x, 0.01) ## Expect "less than"
Now if we run our tests…
ℹ Testing prepUnitTests
✔ | F W S OK | Context
✖ | 2 4 | rmse
Failure (test-rmse.R:15:3): rmse works for a fitted model
`x` has type 'double', not 'character'.
Failure (test-rmse.R:16:3): rmse works for a fitted model
`x` is not strictly less than 0.01. Difference: 0.465
══ Results ════════════════════════════════════════════════════════════════════════════════
── Failed tests ───────────────────────────────────────────────────────────────────────────
Failure (test-rmse.R:15:3): rmse works for a fitted model
`x` has type 'double', not 'character'.
Failure (test-rmse.R:16:3): rmse works for a fitted model
`x` is not strictly less than 0.01. Difference: 0.465
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 4 ]
Having got some failures, we’d either fix our test or our function.
test_that("rmse works for various vectors", {
expect_equal(rmse(c(1,2,3), c(2,2,2)), 0.816, tolerance = 0.01)
expect_equal(rmse(c(0,0,0), c(10,10,10)), 10)
expect_true(is.na(rmse(c(1,2,NA), c(1,2,3))))
expect_error(rmse(c(1,2), c(1,2,3)))
test_that("rmse works for a fitted model", {
m <- lm(Petal.Length ~ Petal.Width, data = iris)
fit <- fitted(m)
act <- iris$Petal.Length
x <- rmse(fit, act)
expect_type(x, "numeric")
expect_lt(x, 0.5) ## Expect "less than"
Unit tests are well supported in R package development.
or devtools::test()
Now imagine that we don’t want to make an R package, but instead have a project
We can still setup unit tests with the testthat
We also need to have saved our rmse()
function somewhere.
test_that("rmse works for various vectors", {
expect_equal(rmse(c(1,2,3), c(2,2,2)), 0.816, tolerance = 0.01)
expect_equal(rmse(c(0,0,0), c(10,10,10)), 10)
expect_true(is.na(rmse(c(1,2,NA), c(1,2,3))))
expect_error(rmse(c(1,2), c(1,2,3)))
test_that("rmse works for a fitted model", {
m <- lm(Petal.Length ~ Petal.Width, data = iris)
fit <- fitted(m)
act <- iris$Petal.Length
x <- rmse(fit, act)
expect_type(x, "numeric")
expect_lt(x, 0.5) ## Expect "less than"
Now that there is not a package structure, we have to run our tests with
✔ | F W S OK | Context
✖ | 2 0 | rmse
Error (test-rmse.r:2:3): rmse works for various vectors
Error in `rmse(c(1, 2, 3), c(2, 2, 2))`: could not find function "rmse"
1. └─testthat::expect_equal(rmse(c(1, 2, 3), c(2, 2, 2)), 0.816, tolerance = 0.01) at test-rmse.r:2:3
2. └─testthat::quasi_label(enquo(object), label, arg = "object") at testthat/R/expect-equality.R:62:3
3. └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
Error (test-rmse.r:13:3): rmse works for a fitted model
Error in `rmse(fit, act)`: could not find function "rmse"
══ Results ═════════════════════════════════════════════════════════════════════════
── Failed tests ────────────────────────────────────────────────────────────────────
Error (test-rmse.r:2:3): rmse works for various vectors
Error in `rmse(c(1, 2, 3), c(2, 2, 2))`: could not find function "rmse"
1. └─testthat::expect_equal(rmse(c(1, 2, 3), c(2, 2, 2)), 0.816, tolerance = 0.01) at test-rmse.r:2:3
2. └─testthat::quasi_label(enquo(object), label, arg = "object") at testthat/R/expect-equality.R:62:3
3. └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
Error (test-rmse.r:13:3): rmse works for a fitted model
Error in `rmse(fit, act)`: could not find function "rmse"
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 0 ]
Error: Test failures
Because we need to load our custom function first.
✔ | F W S OK | Context
✔ | 6 | rmse
══ Results ═════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
And we pass all of our tests!
This is not recommended for packages, but in a project I would consider sourcing the rmse()
function in our testing file.
We have to go up one folder because tests are run in the “tests” folder, so the relative path to the code folder requires the “..” to go up a folder first.
Now that the our unit test sources the rmse()
function itself, we can simply run
✔ | F W S OK | Context
✔ | 6 | rmse
══ Results ═════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
I think for a project this is good.
I think a good general form for a “main.R” script is:
# Restore renv environment (should happen automatically)
## Data Cleaning
## Analysis
## Figures & Tables
## Run Tests
This way, everytime you run “main.R” your tests will run at the end.
Here’s a list of things I would test in that project:
Hopefully you already see how you could write a test using testthat
for each of those items.
Then, as your project evolves, you’ll notice whenver something changes from what you expect.
There is one more package that is useful for this situation:
has many basic expect_()
functions that you can use to write any custom test you want.
has written custom expect_()
functions that make it easier to validate data.
functions written for data.framestestdat
testdat example
into a data.table.
rn mpg cyl disp hp drat wt qsec vs am
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
2: Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
3: Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
4: Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
5: Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0
6: Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0
7: Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0
8: Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0
9: Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0
10: Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0
11: Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0
12: Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0
13: Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0
14: Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0
15: Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0
16: Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0
17: Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0
18: Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1
19: Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1
20: Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1
21: Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0
22: Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0
23: AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0
24: Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0
25: Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0
26: Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1
27: Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1
28: Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1
29: Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1
30: Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1
31: Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1
32: Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1
rn mpg cyl disp hp drat wt qsec vs am
gear carb
<num> <num>
1: 4 4
2: 4 4
3: 4 1
4: 3 1
5: 3 2
6: 3 1
7: 3 4
8: 4 2
9: 4 2
10: 4 4
11: 4 4
12: 3 3
13: 3 3
14: 3 3
15: 3 4
16: 3 4
17: 3 4
18: 4 1
19: 4 2
20: 4 1
21: 3 1
22: 3 2
23: 3 2
24: 3 4
25: 3 2
26: 4 1
27: 5 2
28: 5 2
29: 5 4
30: 5 6
31: 5 8
32: 4 2
gear carb
failure example
Test ValuesWe can expect certain values using…
expect_values(data = mtdt, cyl, c(4, 6, 8))
expect_values(data = mtdt, cyl, c(4, 6, 8, 100))
expect_values(data = mtdt, cyl, c(4, 6))
Error: `mtdt` has 14 records failing value check on variable `cyl`.
Variable set: `cyl`
Filter: None
Arguments: `<dbl: 4, 6>, miss = <chr: NA, "">`
You can also test columns for character values c("A", "B", "C")
Test Ranges of ValuesAnd we can expect a range of values instead of specific ones…
Test a ConditionAnd we can test an if-then condition…
OverviewMany useful expect_
functions for working with data.frames.
Very useful in an economics research project setting.
Not as useful for testing a package (unless its a data package).