Testing code systematically and automatically
April 13, 2026
Unit tests…
Unit tests are a programming methodology/framework where each test runs on the smallest possible portion of the code, so the errors tell you exactly where something went wrong.
Remember back to the start of the course, we motivated writing programming scripts as a replacement for point-and-click software?
Unit tests take this one step farther.
NA values)I want to emphasize this,
When you find yourself testing for the 3rd time by hand if one of you merges worked, you should consider writing a unit test for it instead.
Unit tests are most often used for packages.
But I think unit tests are incredibly useful for research projects as well:
You can then easily rerun all of your tests whenever you update your raw data, or change a step in the analysis.
testthat packagetestthatIn R, the best package for unit tests is testthat.
And then
Notice that we set the “edition” of the package after loading it. This is because so many packages relied upon testthat edition 2 they couldn’t deprecate all the functions they wanted to change, so they made an edition 3 (which is what we will use).
The basic element of testthat unit tests are the expect_ family of functions.
Which didn’t return anything. It only returns something on an error.
And this is the goal of unit tests, throw a helpful error when the result isn’t expected.
A couple of very useful expectations are
These work with logical conditions, which make it easy to write your own expectations.
When you do this the error messages are less helpful, so it’s better to use a pre-built expect_() function if you can.
If you want to check if two numbers are equal, you can use,
If you want to check if two numbers are exactly equal, you use
It can be useful to expect a certain data type.
Error:
! Expected "hello" to have type "double".
Actual type: "character"
Sometimes you will want to expect an error.
Let’s write a very basic function and some tests for it.
And some things we could test:
We’d expect an error when given vectors of different length, but R tries to fix this for us and duplicates values to make them the same length and just throws a warning.
We now have a group of expectations we would like to run for our function.
Let’s make our first “unit” test.
Test passed with 3 successes 🎊.
And we passed!
expect_() functionsLet’s go ahead and add our expectation that failed.
── Warning: rmse works for various vectors ─────────────────────────────────────
longer object length is not a multiple of shorter object length
Backtrace:
▆
1. ├─testthat::expect_error(rmse(c(1, 2), c(1, 2, 3)))
2. │ └─testthat:::quasi_capture(...)
3. │ ├─testthat (local) .capture(...)
4. │ │ └─base::withCallingHandlers(...)
5. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
6. └─global rmse(c(1, 2), c(1, 2, 3))
7. └─base::mean((actual - predicted)^2)
── Failure: rmse works for various vectors ─────────────────────────────────────
Expected `rmse(c(1, 2), c(1, 2, 3))` to throw a error.
Error:
! Test failed with 1 failure and 3 successes.
Let’s fix our rmse() function to throw an error for mismatched vectors.
Now we can rerun our test.
🎉
testthat OverviewWe have seen how to write individual unit tests using testthat package.
expect_() functionsNow we will look at two ways to store and run all of our unit tests.
We will setup unit tests with the testthat package.
Add a folder called tests/ to the project.
Add your testing files in there (i.e. ./tests/test-rmse.R)
We also need to have saved our rmse() function somewhere.
code/rmse.Rtests/test-rmse.R
test_that("rmse works for various vectors", {
expect_equal(rmse(c(1,2,3), c(2,2,2)), 0.816, tolerance = 0.01)
expect_equal(rmse(c(0,0,0), c(10,10,10)), 10)
expect_true(is.na(rmse(c(1,2,NA), c(1,2,3))))
expect_error(rmse(c(1,2), c(1,2,3)))
})
test_that("rmse works for a fitted model", {
m <- lm(Petal.Length ~ Petal.Width, data = iris)
fit <- fitted(m)
act <- iris$Petal.Length
x <- rmse(fit, act)
expect_type(x, "numeric")
expect_lt(x, 0.5) ## Expect "less than"
})We can run our tests with:
✔ | F W S OK | Context
✖ | 2 0 | rmse
────────────────────────────────────────────────────────────────────────────────────
Error (test-rmse.r:2:3): rmse works for various vectors
Error in `rmse(c(1, 2, 3), c(2, 2, 2))`: could not find function "rmse"
Backtrace:
▆
1. └─testthat::expect_equal(rmse(c(1, 2, 3), c(2, 2, 2)), 0.816, tolerance = 0.01) at test-rmse.r:2:3
2. └─testthat::quasi_label(enquo(object), label, arg = "object") at testthat/R/expect-equality.R:62:3
3. └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
Error (test-rmse.r:13:3): rmse works for a fitted model
Error in `rmse(fit, act)`: could not find function "rmse"
────────────────────────────────────────────────────────────────────────────────────
══ Results ═════════════════════════════════════════════════════════════════════════
── Failed tests ────────────────────────────────────────────────────────────────────
Error (test-rmse.r:2:3): rmse works for various vectors
Error in `rmse(c(1, 2, 3), c(2, 2, 2))`: could not find function "rmse"
Backtrace:
▆
1. └─testthat::expect_equal(rmse(c(1, 2, 3), c(2, 2, 2)), 0.816, tolerance = 0.01) at test-rmse.r:2:3
2. └─testthat::quasi_label(enquo(object), label, arg = "object") at testthat/R/expect-equality.R:62:3
3. └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
Error (test-rmse.r:13:3): rmse works for a fitted model
Error in `rmse(fit, act)`: could not find function "rmse"
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 0 ]
Error: Test failures
We got a failueres because we need to load our custom function first.
✔ | F W S OK | Context
✔ | 6 | rmse
══ Results ═════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
And then we pass all of our tests!
In this setup, I would consider sourcing the rmse() function in our testing file.
We have to go up one folder because tests are run in the “tests” folder, so the relative path to the code folder requires the “..” to go up a folder first.
Now that the our unit test sources the rmse() function itself, we can run:
✔ | F W S OK | Context
✔ | 6 | rmse
══ Results ═════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 6 ]
I think for a project this is good.
I think a good general form for a “main.R” script is:
main.r
# Restore renv environment (should happen automatically)
renv::restore()
## Data Cleaning
source("code/clean-data.r")
source("code/transform-data.r")
## Analysis
source("code/run-regressions.r")
## Figures & Tables
source("code/figures/make-scatter-plot.r")
source("code/tables/make-summ-table.r")
source("code/tables/make-reg-table.r")
## Run Tests
testthat::test_dir("tests")Unit tests are well supported in R package development.
usethis::use_test("testname")devtools::check() or devtools::test()See R Packages (2e): Testing Basics for more details.
validate package name items passes fails nNA error warning expression
1 V1 32 32 0 0 FALSE FALSE mpg - 0 >= -1e-08
2 V2 32 32 0 0 FALSE FALSE cyl - 0 >= -1e-08
3 V3 32 26 6 0 FALSE FALSE mpg/wt <= 10
4 V4 1 0 1 0 FALSE FALSE cor(mpg, cyl) >= 0.2
test_that("Raw survey data conforms to domain rules", {
rules <- validator(
mpg >= 0,
cyl >= 0,
mpg/wt <= 10,
cor(mpg, cyl) >= 0.2
)
evaluation <- confront(mtcars, rules)
eval_summary <- summary(evaluation)
total_failures <- sum(eval_summary$fails)
expect_equal(
total_failures,
0,
info = {paste("Data validation failed! Check rules:",
paste(eval_summary$name[eval_summary$fails > 0], collapse = ", "), "\n",
paste(capture.output(print(eval_summary)), collapse = "\n")
)}
)
})── Failure: Raw survey data conforms to domain rules ───────────────────────────
Expected `total_failures` to equal 0.
Differences:
1/1 mismatches
[1] 7 - 0 == 7
Data validation failed! Check rules: V3, V4
name items passes fails nNA error warning expression
1 V1 32 32 0 0 FALSE FALSE mpg - 0 >= -1e-08
2 V2 32 32 0 0 FALSE FALSE cyl - 0 >= -1e-08
3 V3 32 26 6 0 FALSE FALSE mpg/wt <= 10
4 V4 1 0 1 0 FALSE FALSE cor(mpg, cyl) >= 0.2
Error:
! Test failed with 1 failure and 0 successes.
Automate, customize, and execute your software development workflows right in your repository with GitHub Actions. You can discover, create, and share actions to perform any job you’d like, including CI/CD, and combine actions in a completely customized workflow. - Github Action documentation
Allow you to execute code on a remote server hosted by Github.
There is a tab for Github Actions for every repository.
You are running code on someone else’s server, so there is a limit to how much you can run. Github Action Billing.
But, Github Actions are free for public repositories.
And, you should have 2,000 minutes of run time for private repositories for a free account.
So practically, you can run most things without a worry.
By default, Github Actions will run on a server with
Which is to say, these server resources are not super big.
But they also should be big enough to run most projects.
You can upgrade to Github Actions running on servers that allocate more resources to you.
Up to :
But this will start costing actual money to do (Github Larger Runners). You should probably be looking at running your code on Brown’s HPC if you need something close to this size.
Github Actions is a service that allows you to run workflows.
from the Github Actions documentation
Events are what trigger a workflow to run.
Could be a Git event
Can be triggered manually
Or set to run at specified time intervals (i.e. once a day)
A runner is a server—hosted by Github—that will run your jobs.
There is always one runner for each job.
They are virtual machines that can have Ubuntu Linux, Microsoft Windows, or macOS operating systems.
They default to a small amount of computing power, but can be upgraded.
Jobs are a set of steps to be run.
One job gets assigned to each runner.
Jobs can be run in parallel (default) or in sequence.
A workflow could have one or more jobs.
Steps are the actual commands given to the runner (the server).
Steps can be either:
Steps are where we will say “run this code” or “execute this R script”
This is not to be confused with Github Actions which is the name of the whole service.
An action is…
Basically, an action performs many steps (kind of like a function call).
Github provides some default actions, and you can use actions written by other users.
Github Actions is a service that allows you to run workflows.
from the Github Actions documentation
Github Workflows live a specific folder in your repository:
“.github/workflows/”
Each workflow is defined by a “yaml” file.
Once you define this folder, and a “yaml” file in it, Github will launch a workflow for you defined by the file.
How do we get GitHub Actions to run R code?
We first have to tell it to install R, then give it R code to run.
This is a very basic Github workflow that runs print("hello world") in R.
First, we had to install R on the virtual machine.
Then, we simply executed our R command.
Rscript -e 'command' allows you to run any one-line R commandBut what if we want to run more than one line?
Usually we will have R scripts written in our repository that we want to run.
Let’s assume we have a “main.r” file, we can run it with…
This is still a single step.
The name: line just names the step and is optional.
The run: line is broken up into multiple lines with the | symbol.
And then an R script can be run by calling Rscript name-of-file.r.
Remember, these virtual machines come with nothing installed.
Which means we don’t have access to any packages.
A couple of options:
install.packages() or renv::install()renv to create a lockfile, and then simply run renv::restore()Option 2 is far better to option 1. In fact, there is a r-lib action that will restore a renv environment for us.
renvThis will restore our environment to the state of packages in the lockfile.
It also caches them, so next time our workflow runs, it’s much faster.
This is an action I used, which made me realize some of my packages depended on system libraries that weren’t on certain operating systems, so there is a step to install those dependencies.
on: [push]
name: Run Main.R
jobs:
RunMain:
runs-on: ${{ matrix.config.os }}
name: ${{ matrix.config.os }} (${{ matrix.config.r }})
strategy:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'release'}
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes
PKG_SYSREQS: false
steps:
- name: dependencies on Linux
if: runner.os == 'Linux'
run: |
sudo apt-get update
sudo apt-get install -y make libicu-dev libxml2-dev libssl-dev pandoc librdf0-dev libnode-dev libcurl4-gnutls-dev libgsl-dev
sudo apt install libharfbuzz-dev libfribidi-dev
- name: dependencies on MacOS
if: runner.os == 'Macos'
run: |
brew install harfbuzz fribidi openssl@1.1
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-renv@v2
- name: Run main.R
run: |
Rscript main.RPre-built Github Actions for R
They also have a set of example workflows:
Continuous integration (CI) is a programing practice / framework.
The idea is that team of developers write their own sections of code separately, but continually integrate their code to a common repository.
This is in contrast to a system where developers write their code on their own machine, then everyone merges their code together at the end and tries to fix any errors then.
You sometimes will see CI/CD for Continuous Integration/Continuous Deployment. Which adds that the main repository of code is automatically shipped/deployed so customers/other people can use it.
Github allows users to effectively have a CI practice for their code.
If working with multiple users, they can all share a common repository.
And Github Actions can build the code and run tests automatically.
How could CI be useful for us?
Here are a few Github Actions I have used for projects:
devtools::check() on an R package I was writingstylr and lintr which check your code’s style formating