R Functions and Packages

Writing your own functions

Matthew DeHaven

January 1, 2024

Course Home Page

Lecture Summary

  • What is a function?

    • Syntax
    • Variable scope
  • Anonymous functions

  • purrr::map() family

  • Packages

    • What is a package?
    • Making your own package

Functions

Functions

Functions…

  1. take input(s)
  1. do something
  1. return an ouput

R is a Functional Language

R, at its heart, is a functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions. In particular, R has what’s known as first class functions. You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function. Hadley Wickham, Advanced R

Why use functions?

Functions make your code

  • more flexible
  • less repetitive
  • more readable (potentially)

A Basic Function

Here is a basic, if not exciting, function:

cube <- function(input) {
  cubed <- input ^ 3        
  return(cubed)             
}
cube(2)
[1] 8
  1. Take inputs
  2. Do something
  3. Return output

Syntax

Functions are always declared using the function function(),

followed by curly braces { } which demark what the function actually does.

Most functions will end with a return() call, though this is not strictly required.

“Degenerate” functions

myfunc1 <- function() {print("Just prints something")}
myfunc1()
[1] "Just prints something"

Which is just to show that technically inputs and outputs are not necessary.

myfunc2 <- function() { }
myfunc2()
NULL

In fact, we could make a function that does nothing…not very useful.

Variable Scope

Functions have their own “environment” for variables.

cubed <- "Outside function"
cube <- function(x) {
  cubed <- x ^ 3
  return(cubed)
}
cubed
[1] "Outside function"

Even if we call our function, the value for cubed is not overwritten in our R session environment.

cube(3)
[1] 27
cubed
[1] "Outside function"

Breaking Function Scope

You can break the scope of a function and affect vaiables outside, using <<-

cubed <- "Outside function"
cube <- function(x) {
  cubed <<- x ^ 3
  return(cubed)
}
cube(3)
[1] 27
cubed
[1] 27

But this is a really bad idea.

You should only allow functions to affect your session by returning values.

Otherwise, it can become very confusing to tell how what is changing a variable.

Limiting Function Scope

Technically, you can use a variable defined outside a function inside a function.

beta <- 0.99
utility <- function(x) {
  u <- beta * log(x)
  return(u)
}
utility(10)
[1] 2.279559

But this is also a bad idea.

  • Functions are meant to be flexible and portable

  • By relying on a session variable we’ve made this function dependent on the current setting.

Adding Additional Arguments

Instead, we should pass beta as an additional argument for our function.

utility <- function(x, beta) {
  u <- beta * log(x)
  return(u)
}

Now we can call our function by passing two values.

utility(10, 0.99)
[1] 2.279559
  • This makes our function more general, flexible, and reusable.

Setting a Default Argument

When declaring a function, you can set a default value for an argument.

utility <- function(x, beta = 0.99) {
  u <- beta * log(x)
  return(u)
}
utility(10)
[1] 2.279559

That default value will be used unless you specify a new value to overwrite it.

utility(10, beta = 0.78)
[1] 1.796016

Ordering of Arguments

Functions assumes unlabelled arguments are given in the order they were declared.

i.e. for our utility function, the first is our consumption value, and the second is our beta.

utility(10, 0.98)
[1] 2.256533

You can always be explicit about function arguments as well.

utility(x = 10, beta = 0.98)
[1] 2.256533

And if you are explicit, you can change the order of arguments.

utility(beta = 0.99, x = 10)
[1] 2.279559

Printing out a function

We can always inspect the actual code for a function by calling it without the parentheses ().

cube
function(x) {
  cubed <<- x ^ 3
  return(cubed)
}

This works for functions from other packages too.

dplyr::lag
function(x, n = 1L, default = NULL, order_by = NULL, ...) {
  if (inherits(x, "ts")) {
    abort("`x` must be a vector, not a <ts>, do you want `stats::lag()`?")
  }
  check_dots_empty0(...)

  check_number_whole(n)
  if (n < 0L) {
    abort("`n` must be positive.")
  }

  shift(x, n = n, default = default, order_by = order_by)
}
<bytecode: 0x128c65278>
<environment: namespace:dplyr>

Anonymous Functions

Anonymous functions are functions that do not have a name.

This means they are not stored as a variable that you can use over again, but exist only for a moment.

function(x) {x ^ 3}
function(x) {x ^ 3}

For functions that only take one line, you can drop the curly brackets.

function(x) x ^ 3
function(x) x ^ 3

But why would we ever want to do this?

  • Sometimes, you want to pass functions as an argument to another function…

purrr::map() family

Map Family of Functions

We saw briefly the *apply() family of base R functions (when we were looking at loops).

The map() family is the tidyverse equivalent and are nicer to use.

library(purrr)
some_vector <- c(1, 4, 6)
map(some_vector, cube)
[[1]]
[1] 1

[[2]]
[1] 64

[[3]]
[1] 216

Map Function

The map() function always

  • takes a vector (or list) for input
  • calls a function on each element
  • returns a list as the result
map(list(1, 4, 6), cube)
[[1]]
[1] 1

[[2]]
[1] 64

[[3]]
[1] 216

Using an Anonymous Function with map()

This is an example where we may want to declare an anonymous function.

map(1:3, function(x) {x ^ 3 + 2})
[[1]]
[1] 3

[[2]]
[1] 10

[[3]]
[1] 29

And again, because our function is only one line, we could drop the curly braces.

map(1:3, function(x) x ^ 3 + 2)
[[1]]
[1] 3

[[2]]
[1] 10

[[3]]
[1] 29

Multiline Anonymous Functions

Conversely, you could have a multiline anonymous function.

datasets <- list(mtcars, iris)
map(datasets, function(x) {
  x |>
    summary()
})
[[1]]
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

[[2]]
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

map Return Types

By default, map() will always return a list.

It does not know the datatype of the objects you are returning,

  • and a list works with any data type and with a mix of data types.

If you do know your datatype, you could use…

  • map_lgl()
  • map_int()
  • map_dbl()
  • map_chr()
  • Which will all return a vector of that data type.

Example with map_dbl()

Using our prior map example,

map_dbl(1:3, function(x) x ^ 3 + 2)
[1]  3 10 29

If we use the wrong one, however, we get an error.

map_lgl(1:3, function(x) x ^ 3 + 2)
Error in `map_lgl()`:
ℹ In index: 1.
Caused by error:
! Can't coerce from a number to a logical.

map_dfr() for data.frames

We saw before that the map_dfr() was convenient for working with fredr package.

library(fredr)
map_dfr(c("UNRATE", "GDP", "FEDFUNDS"), fredr)
# A tibble: 2,062 × 5
   date       series_id value realtime_start realtime_end
   <date>     <chr>     <dbl> <date>         <date>      
 1 1948-01-01 UNRATE      3.4 2024-03-31     2024-03-31  
 2 1948-02-01 UNRATE      3.8 2024-03-31     2024-03-31  
 3 1948-03-01 UNRATE      4   2024-03-31     2024-03-31  
 4 1948-04-01 UNRATE      3.9 2024-03-31     2024-03-31  
 5 1948-05-01 UNRATE      3.5 2024-03-31     2024-03-31  
 6 1948-06-01 UNRATE      3.6 2024-03-31     2024-03-31  
 7 1948-07-01 UNRATE      3.6 2024-03-31     2024-03-31  
 8 1948-08-01 UNRATE      3.9 2024-03-31     2024-03-31  
 9 1948-09-01 UNRATE      3.8 2024-03-31     2024-03-31  
10 1948-10-01 UNRATE      3.7 2024-03-31     2024-03-31  
# ℹ 2,052 more rows

This map function returns data.frames (tibbles) and then combines them by rbind-ing them.

  • rbind() takes two data.frames and stacks them on top of each other
  • cbind() takes two data.frames and stacks them beside one another

purrr Suggests a Different Function

purrr notes in their documentation that map_dfr() has been superseded.

Instead, we should write

map(c("UNRATE", "GDP", "FEDFUNDS"), fredr) |>
 list_rbind()
# A tibble: 2,062 × 5
   date       series_id value realtime_start realtime_end
   <date>     <chr>     <dbl> <date>         <date>      
 1 1948-01-01 UNRATE      3.4 2024-03-31     2024-03-31  
 2 1948-02-01 UNRATE      3.8 2024-03-31     2024-03-31  
 3 1948-03-01 UNRATE      4   2024-03-31     2024-03-31  
 4 1948-04-01 UNRATE      3.9 2024-03-31     2024-03-31  
 5 1948-05-01 UNRATE      3.5 2024-03-31     2024-03-31  
 6 1948-06-01 UNRATE      3.6 2024-03-31     2024-03-31  
 7 1948-07-01 UNRATE      3.6 2024-03-31     2024-03-31  
 8 1948-08-01 UNRATE      3.9 2024-03-31     2024-03-31  
 9 1948-09-01 UNRATE      3.8 2024-03-31     2024-03-31  
10 1948-10-01 UNRATE      3.7 2024-03-31     2024-03-31  
# ℹ 2,052 more rows

Which has the same result.

Back to Functions

When Should You Use a Function?

Rule of Three

  • When you duplicate some code three times, you should write it as a function.

This is obviously a rule-of-thumb, but it’s a useful starting point.

You should also consider for your projects,

  • will I need to iterate over this step?
  • will I need to run robustness checks on this step?

Where Do You Write a Function?

A function has to be declared before you can use it.

So the simplest spot to put them is at the top of your R script.

## My functions
my_func <- function(x) {
  ## Does something
}
my_func2 <- function(x) {
  ## Does something else
}

## The rest of my code
data <- my_func(x)
my_func2(data)

But this can get overcrowded very quickly.

Sourcing Helper Scripts

A better solution is to store your functions in their own separate scripts.

Then you source() them into the main script, or wherever you need them.

helpers.r
## My functions
my_func <- function(x) {
  ## Does something
}
my_func2 <- function(x) {
  ## Does something else
}
main.r
source("helpers.r")

## The rest of my code
data <- my_func(x)
my_func2(data)

Or, you could put your functions into their own package…

Packages

What are Packages?

We have used many different R packages throughout the course.

Packages provide

  • R functions
  • documentation for those functions
  • possibly some sample data

We have seen how you can install packages remotely from CRAN using renv::install() or install.packages().

Other Package Sources

You can also install packages that are stored either

  • on Github
  • or locally, on your machine.

This can be great for installing packages in development, or for packages you write.

Writing Your Own Package

Why would you want to write a package?

  • You have functions that you would like to add documentation to.
  • You have functions you plan to use across multiple projects.
  • You have functions you would like to share with others.

Two Possible Package Structures

I suggest the following two ways to structure a package:

  1. Create a Github repository devoted to this package.
  1. Create a subfolder within a project devoted to this package.
  1. is great for sharing a package with others and using it for multiple projects.
  1. is for helper functions within a project that you want to have more documentation than a usual source() file.

Two Possible Package Structures

I suggest the following two ways to structure a package:

  1. Create a Github repository devoted to this package.

  2. Create a subfolder within a project devoted to this package.

I will show you (2) as the coding example for today. (1) will be your assignment for the week.

The Best Resource for Writing R Packages

If you want to write a package, read R Packages (2e) by Hadley Wickham and Jennifer Bryan.

It starts with a simple example package.

Then goes into detail about every element of an R package.

I will try to give you the highlights today, but we won’t cover everything.

Writing an R Package Overview

First, we will install a package devtools.

renv::install("devtools")

This package has a bunch of useful “tools” for “developing” packages.

It will also install a package, usethis, which helps create some templates for us.

Create a Package

First, navigate to where you would like your package to live.

For creating a subfolder package,

usethis::create_package("packageName")

or for creating a package for your current folder/repository,

usethis::create_package(".")

"." is a filepath that means “here”, which is your current working directory (folder).

Create a Package Output

The create_package() function will create some folders and files needed to structure a package.

  • .Rbuildignore lists files to ignore when building the package
  • .gitignore
  • DESCRIPTION file for metadata about your package
  • NAMESPACE file listing dependencies and functions exported
  • R/ a folder where we will put all of our functions
  • packageName.Rproj for Rstudio projects, we won’t use it

Your working directory will be changed to the package folder! This is good for what we are doing—developing the package—but could be a surprise. I’d suggest having two terminals open, one for developing the pacakge, one for your project.

Adding Functions to Our Package

To add a function to our package, we simply declare it in the R/ folder in a new R script.

myfunc.r
myfunc <- function(a, b) {
  result <- a + b * a
  return(result)
}

In general, you should have one .r file for each function, or at least for each family of very similar functions.

Then we can save the file.

Accessing Our Package

Installing and loading our package depends a bit on if it is a subfolder or on Github.

  1. We put our package in a Github repository
renv::install("my-github-username/packageName")
# library(packageName)
  1. We created our package as a subfolder
devtools::install("packageName")
# library(packageName)

Note

Both of these assume we are now back in our project working directory, not in the package working directory.

Using Our New Function

We just laoded our own package and our function myfunc()!

We can now use it in all of our scripts.

But how do we add documentation?

Adding Documentation to Your Function

We will be using devtools::document() function, which relies on the roxygen2 package.

myfunc.r
#' My first function
#'
#' @param a A numerical vector.
#' @param b Also a numerical vector.
#'
#' @return A nmerical vector of a + b * a.
#' @export
#'
#' @examples
#' myfunc(3, 5)
myfunc <- function(a, b) {
  result <- a + b * a
  return(result)
}

roxygen2

roxygen2 takes the comments we write just before our function and translates them into documentation files for us.

This is a lot easier than writing the documentation files ourselves.

After we’ve added the comments to myfunc.r, we simply call

devtools::document()

Reloading Our Package

Now, if we go back to our project, and reload the package, we can use

?myfunc

And we should be able to see the documentation we wrote!

Adding Package Metadata

Especially if you are publishing your package for others, you will want to add some metadata for your package.

  • Edit the DESCRIPTION file

You can add things like,

  • description of what the package does
  • authors
  • package title

To add a license, call devtools::use_mit_license() (or a similar license function).

Package Check

CRAN has a lot of standards that packages have to pass.

You can (and you should) check to see if you package passes by calling devtools::check().

It is good to check your package early and often.

Even if you never plan to submit it to CRAN, they are good standards to follow.

Adding a Dependency to Your Package

If you want to use another package’s functions in your package, first call devtools::add_package("thatPackage").

  • This adds thatPackage to the list of “Imports” in the NAMESPACE file.

Then you can use that package functions by writing,

thatPackage::some_function()
  • Always use the explicit function call when writing packages.
  • Otherwise you will never pass devtools::check()
    • Being explicit insures our package behavior remains the same regardless of package loading order.

Lecture Summary

Lecture Summary

  • Functions
    • Anonymous functions
    • map functions
    • source()
  • Packages
    • devtools

Coding Example

  • Writing a package as a subfolder of a project