May 2016

Make it reproducible

Or how to write your own R packages


So you've been using R for a little while now. You learned your way around the base library and you found a couple of useful packages you like to keep at hand. Inevitably, you've also written a few general purpose functions you use recurrently. The question now becomes whether to keep copying the files into the folder of every new project or rather build them into your very own R package you can just import when needed.

As with almost anything in life, the answer is not straightforward and there are advantages and disadvantages to both approaches. Copying your functions into every project folder gives you a lot of flexibility, but there will be a lot of unnecessary code duplication. Consolidating that code into a centralized package, on the other hand, will keep it neatly organized and necessarily well designed, but it can be difficult to ensure backwards compatibility.

When I started coding in R I went with the decentralized approach every time, not because it was the right choice but because writing a package takes some amount of learning, while copying files around is something anyone can do. This post will hopefully get you to the point where you can make a more informed choice.

And this is only one reason to pack your code up in a package, even if, arguably, the most common one for beginners. The main reason to write a package is probably sharing your code with collaborators or the R-using world.

Formal guides to writing R packages

If you've already started writing your own packages and are looking for the official guide, you can find it at Writing R extensions. You will be better off sticking to Hadley Wickham's wisdom, though, so just go straight to his definitive guide: R packages. If you don't know who Hadley Wickham is, then let me say I'm impressed: you haven't even been using R long enough to have come across ggplot or dplyr and you're already thinking of developing your own R package. Kudos to you!

Irrespective of whether or not you've been touched by the 'Hadleyverse' before, if what you're looking for is a quick guide to get you up and running, you've come to the right place. This post will give you all you need to write your first R package. We won't be using Hadley's book on the subject, but we'll still be using devtools, staticdocs and roxygen2, three of his packages. By the end of this short read, you'll be ready to write your first R package, documentation website included!

Create your package

The first thing you need to do is get Hadley Wickham's R package development and documentation helper packages (that's right, packages written to help develop other packages): devtools and roxygen2. Next you'll want to set your working directory and create your package folder. This will create all the required files, plus a folder called 'R' to host your functions. I won't go into more detail here, but you will definitely want to go learn more about the created files as your packages get more complex.

        
# Get and load the helper packages
install.packages(c('devtools', 'roxygen2'))

# Navigate to package's destination directory and create it
setwd('location/of/your/package')
devtools::create('Rpckg')
		

With this you're done creating your package and won't need the previous steps until you want to create a new one. Now you need to add some functions.

The development cycle

1. Add function(s)

Adding functions is simple: save your code in a file with the extension .R in the 'R' folder in your new package. I like to keep every function in its own separate file, but it will work just as well if you have more than one function per file. Now comes a really cool part: writing the documentation. Yes, I did say 'cool' and that's thanks to roxygen2. Add the documentation to each function using roxygen comments as illustrated below. This will be compiled into standard R documentation and can also be used to easily make a nice documentation website.

		
#' Compute the triple of input number
#'
#' This function multiplies an input number by three.
#' @param input Number to triple. No default.
#' @param verbose Boolean indicating whether to display message. Defaults to \code{TRUE}.
#' @return Triple of input number as numeric value.
#' @examples
#' triple(23457)
#' triple(342, verbose = F)
#' @export

triple <- function(input, verbose = TRUE) {
    if(verbose) {
        cat('Computing triple of', input, '\n')
    }
    return(input * 3)
}
		

2. Compile documentation and test

You are now ready to compile the documentation file and load the development version of your function. You can check that the documentation looks good and test the function. In this stage you will make any required changes and reload to check again, going through as many iterations as you need. Here's the code.

		
# Compile documentation
devtools::document()
# Load development version
devtools::load_all()

# Check documentation
?triple

# Test function
a <- triple(2476937509)
b <- triple(2476937509, FALSE)
		

3. Install the package

That's it, your package is ready to use. You can use devtools to install it, as follows, and then use it as you would any other R package. Whenever you want to add functions to your package just go through steps 1. and 2. above. As many times as you want.

			
setwd('location/of/your/package')
devtools::install('Rpckg')

# Load in standard way
library('Rpckg')
a <- triple(2476937509)

# Or keep your namespaces clean
a <- Rpckg::triple(2476937509)
			

Extras

Easy publishing

Even if you're not planning on submitting it to CRAN, you can still publish and distribute your package for easy installation by others. You are probably using some kind of version control (if you are not, you should start now) and you can leverage that to easily publish your package. Git is currently the most common system and devtools provides functions to install packages directly from GitHub or BitBucket, among other options. It's as simple as running devtools::install_github('github_username/repo'). You will see it in action in the next section here below.

Documentation website

Hopefully you have found the documentation easy to write with roxygen2 and roxygen comments. If you're not convinced, here's a bonus for you: you can now use the same documentation you have already built in your package to create a nice documentation website with three lines of code. All you need to do is install Hadley Wickham's staticdocs and it will do all the work for you. The result is a cool documentation website, just like the ones you may be used to seeing for Hadley Wickham's packages.

		
# As of this writing, staticdocs is not yet on CRAN; grab the development version
devtools::install_github("hadley/staticdocs")
# Navigate to package's parent directory and build site
# (use 'examples = FALSE' if you don't want your examples to run)
setwd('location/of/your/package')
staticdocs::build_site('Rpckg', '/website/location/', examples = FALSE)
	

Update

Hadley Wickham has since developed an alternative to staticdocs, called pkgdown. It is just as easy to use and more powerful than staticdocs. Be sure to go checkout all the options it provides, but here's the minimal usage example. It is really similar to staticdocs's, but it will build the package found in the current directory by default and put the output in a docs folder (which is created if not found) in the package directory.

		
# Grab the development version from GitHub
devtools::install_github("hadley/pkgdown")
# Navigate to package's parent directory and build site
# (use 'examples = FALSE' if you don't want your examples to run)
setwd('location/of/your/package')
pkgdown::build_site()
	

That's it. You have now created a fully functional R package and a slick documentation website to go with it. There's a few other quick informal guides to writing your own R package, just like this one, out there. Two cool blog posts I found really interesting are Hilary Parkers' Writing an R package from scratch and Andrew Brooks' DIY building an R package. I encourage you to check them out.

On a final note, RStudio, the de facto R IDE, includes great functionality for package development. You may want to check that out, as a great alternative to this more manual approach.