[1] 8
Writing your own functions
January 1, 2024
What is a function?
Anonymous functions
purrr::map()
family
Packages
Functions…
R, at its heart, is a functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions. In particular, R has what’s known as first class functions. You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function. Hadley Wickham, Advanced R
Functions make your code
Here is a basic, if not exciting, function:
Functions are always declared using the function function()
,
followed by curly braces { }
which demark what the function actually does.
Most functions will end with a return()
call, though this is not strictly required.
Which is just to show that technically inputs and outputs are not necessary.
In fact, we could make a function that does nothing…not very useful.
Functions have their own “environment” for variables.
[1] "Outside function"
Even if we call our function, the value for cubed
is not overwritten in our R session environment.
You can break the scope of a function and affect vaiables outside, using <<-
[1] 27
[1] 27
But this is a really bad idea.
You should only allow functions to affect your session by returning values.
Otherwise, it can become very confusing to tell how what is changing a variable.
Technically, you can use a variable defined outside a function inside a function.
But this is also a bad idea.
Functions are meant to be flexible and portable
By relying on a session variable we’ve made this function dependent on the current setting.
Instead, we should pass beta as an additional argument for our function.
When declaring a function, you can set a default value for an argument.
That default value will be used unless you specify a new value to overwrite it.
Functions assumes unlabelled arguments are given in the order they were declared.
i.e. for our utility function, the first is our consumption value, and the second is our beta.
You can always be explicit about function arguments as well.
We can always inspect the actual code for a function by calling it without the parentheses ()
.
This works for functions from other packages too.
function(x, n = 1L, default = NULL, order_by = NULL, ...) {
if (inherits(x, "ts")) {
abort("`x` must be a vector, not a <ts>, do you want `stats::lag()`?")
}
check_dots_empty0(...)
check_number_whole(n)
if (n < 0L) {
abort("`n` must be positive.")
}
shift(x, n = n, default = default, order_by = order_by)
}
<bytecode: 0x122304120>
<environment: namespace:dplyr>
Anonymous functions are functions that do not have a name.
This means they are not stored as a variable that you can use over again, but exist only for a moment.
For functions that only take one line, you can drop the curly brackets.
But why would we ever want to do this?
purrr::map()
familyWe saw briefly the *apply()
family of base R functions (when we were looking at loops).
The map()
family is the tidyverse
equivalent and are nicer to use.
The map()
function always
map()
This is an example where we may want to declare an anonymous function.
Conversely, you could have a multiline anonymous function.
[[1]]
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
[[2]]
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
map
Return TypesBy default, map()
will always return a list.
It does not know the datatype of the objects you are returning,
If you do know your datatype, you could use…
map_lgl()
map_int()
map_dbl()
map_chr()
map_dbl()
Using our prior map example,
map_dfr()
for data.framesWe saw before that the map_dfr()
was convenient for working with fredr
package.
# A tibble: 2,085 × 5
date series_id value realtime_start realtime_end
<date> <chr> <dbl> <date> <date>
1 1948-01-01 UNRATE 3.4 2025-01-10 2025-01-10
2 1948-02-01 UNRATE 3.8 2025-01-10 2025-01-10
3 1948-03-01 UNRATE 4 2025-01-10 2025-01-10
4 1948-04-01 UNRATE 3.9 2025-01-10 2025-01-10
5 1948-05-01 UNRATE 3.5 2025-01-10 2025-01-10
6 1948-06-01 UNRATE 3.6 2025-01-10 2025-01-10
7 1948-07-01 UNRATE 3.6 2025-01-10 2025-01-10
8 1948-08-01 UNRATE 3.9 2025-01-10 2025-01-10
9 1948-09-01 UNRATE 3.8 2025-01-10 2025-01-10
10 1948-10-01 UNRATE 3.7 2025-01-10 2025-01-10
# ℹ 2,075 more rows
This map function returns data.frames (tibbles) and then combines them by rbind
-ing them.
rbind()
takes two data.frames and stacks them on top of each othercbind()
takes two data.frames and stacks them beside one anotherpurrr
Suggests a Different Functionpurrr
notes in their documentation that map_dfr()
has been superseded.
Instead, we should write
# A tibble: 2,085 × 5
date series_id value realtime_start realtime_end
<date> <chr> <dbl> <date> <date>
1 1948-01-01 UNRATE 3.4 2025-01-10 2025-01-10
2 1948-02-01 UNRATE 3.8 2025-01-10 2025-01-10
3 1948-03-01 UNRATE 4 2025-01-10 2025-01-10
4 1948-04-01 UNRATE 3.9 2025-01-10 2025-01-10
5 1948-05-01 UNRATE 3.5 2025-01-10 2025-01-10
6 1948-06-01 UNRATE 3.6 2025-01-10 2025-01-10
7 1948-07-01 UNRATE 3.6 2025-01-10 2025-01-10
8 1948-08-01 UNRATE 3.9 2025-01-10 2025-01-10
9 1948-09-01 UNRATE 3.8 2025-01-10 2025-01-10
10 1948-10-01 UNRATE 3.7 2025-01-10 2025-01-10
# ℹ 2,075 more rows
Which has the same result.
Rule of Three
This is obviously a rule-of-thumb, but it’s a useful starting point.
You should also consider for your projects,
A function has to be declared before you can use it.
So the simplest spot to put them is at the top of your R script.
But this can get overcrowded very quickly.
A better solution is to store your functions in their own separate scripts.
Then you source()
them into the main script, or wherever you need them.
Or, you could put your functions into their own package…
We have used many different R packages throughout the course.
Packages provide
We have seen how you can install packages remotely from CRAN using renv::install()
or install.packages()
.
You can also install packages that are stored either
This can be great for installing packages in development, or for packages you write.
Why would you want to write a package?
I suggest the following two ways to structure a package:
source()
file.I suggest the following two ways to structure a package:
Create a Github repository devoted to this package.
Create a subfolder within a project devoted to this package.
I will show you (2) as the coding example for today. (1) will be your assignment for the week.
If you want to write a package, read R Packages (2e) by Hadley Wickham and Jennifer Bryan.
It starts with a simple example package.
Then goes into detail about every element of an R package.
I will try to give you the highlights today, but we won’t cover everything.
First, we will install a package devtools
.
This package has a bunch of useful “tools” for “developing” packages.
It will also install a package, usethis
, which helps create some templates for us.
First, navigate to where you would like your package to live.
"."
is a filepath that means “here”, which is your current working directory (folder).
The create_package()
function will create some folders and files needed to structure a package.
.Rbuildignore
lists files to ignore when building the package.gitignore
DESCRIPTION
file for metadata about your packageNAMESPACE
file listing dependencies and functions exportedR/
a folder where we will put all of our functionspackageName.Rproj
for Rstudio projects, we won’t use itYour working directory will be changed to the package folder! This is good for what we are doing—developing the package—but could be a surprise. I’d suggest having two terminals open, one for developing the pacakge, one for your project.
To add a function to our package, we simply declare it in the R/
folder in a new R script.
In general, you should have one .r
file for each function, or at least for each family of very similar functions.
Then we can save the file.
Installing and loading our package depends a bit on if it is a subfolder or on Github.
Note
Both of these assume we are now back in our project working directory, not in the package working directory.
We just laoded our own package and our function myfunc()
!
We can now use it in all of our scripts.
But how do we add documentation?
We will be using devtools::document()
function, which relies on the roxygen2
package.
roxygen2
roxygen2
takes the comments we write just before our function and translates them into documentation files for us.
This is a lot easier than writing the documentation files ourselves.
Now, if we go back to our project, and reload the package, we can use
And we should be able to see the documentation we wrote!
Especially if you are publishing your package for others, you will want to add some metadata for your package.
DESCRIPTION
fileYou can add things like,
To add a license, call devtools::use_mit_license()
(or a similar license function).
CRAN has a lot of standards that packages have to pass.
You can (and you should) check to see if you package passes by calling devtools::check()
.
It is good to check your package early and often.
Even if you never plan to submit it to CRAN, they are good standards to follow.
If you want to use another package’s functions in your package, first call devtools::add_package("thatPackage")
.
thatPackage
to the list of “Imports” in the NAMESPACE
file.devtools::check()
map
functionssource()
devtools