Writing Vignettes

The basic workflow for creating vignettes is the following:

  1. Initialize vignette with usethis::use_vignette(name = "vignette_name",title = "Vignette Title") in the Console.

  2. Edit the contents of the vignette. Preview the vignette by clicking Knit.

  3. Once you’re happy with the vignette’s contents, build the package website with pkgdown::build_site().

  4. Push the local changes to GitHub.

The rest of this section will detail vignette-writing within the dspgWork R package.

Vignettes are helpful to demonstrate or explain R code. Writing vignettes in .Rmd files allow the author to combine prose (like what you’re reading right now) with executable R code. An author can include additional exposition to help others understand the purpose of a piece of code.

For example: “the code below will install (if you haven’t already) three important R packages for R package development: usethis, devtools, and pkgdown.” Note that pkgdown is smart enough to automatically link in-line code to the appropriate online documentation page: for example, clicking this -> dplyr::mutate will take you to the the mutate function’s doc page.

# install.packages("usethis")
library(usethis)
# install.packages("devtools")
library(devtools)
# install.packages("pkgdown")
library(pkgdown)

The usethis package includes a function usethis::use_vignette that will automatically initialize a vignette folder (if one doesn’t already exist) and an .Rmd file. Run the following code in your Console to initialize an example_vignette.Rmd file. NOTE: make sure that your working directory is set to the package’s root directory (containing the R, data_raw, data_clean, and other folders). Use the setwd function to change your working directory if needed.

usethis::use_vignette(name = "example_vignette",title = "Example Vignette")

Writing vignettes is somewhat of an art. The following section shows an example of a vignette, but you are highly encouraged to read Hadley Wickhams’ take on vignettes in his R Packages book available here.

Vignette contents example

Suppose we want to use this vignette to explain a particular plot. As a boring example, we will consider the mtcars data set available in RStudio. We not only want to show the code used to create the actual plot, but also any code used beforehand to “clean” the data. You can use text to describe any choices you made while cleaning data. For example, if you decide to aggregate or filter data beforehand, you can use the text to explain your motivation.

It is often vignette writing best practice to load any packages you use in the beginning of the document.

library(tidyverse)
#> -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
#> v ggplot2 3.3.3     v purrr   0.3.4
#> v tibble  3.1.2     v dplyr   1.0.6
#> v tidyr   1.1.2     v stringr 1.4.0
#> v readr   1.4.0     v forcats 0.5.1
#> -- Conflicts ------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
library(ggrepel)

We want to plot the relationship between miles/gallon and # of cylinders for all Mercedes cars. In the mtcars data set, the Make/Model of the car is stored as row names (as opposed to in its own column). We will add two columns containing the Make and Model information, respectively, instead. The tidyr::separate function allows use to split a column containing strings by a given separator (in this case, an empty space " ").

pltData <- mtcars %>%
  mutate(make_model = row.names(mtcars)) %>%
  tidyr::separate(col = make_model,into = c("make","model"),sep = " ",remove = TRUE) %>%
  filter(make == "Merc")
#> Warning: Expected 2 pieces. Additional pieces discarded in 3 rows [2, 4, 29].
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [6].

pltData
#>              mpg cyl  disp  hp drat   wt qsec vs am gear carb make  model
#> Merc 240D   24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2 Merc   240D
#> Merc 230    22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2 Merc    230
#> Merc 280    19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4 Merc    280
#> Merc 280C   17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4 Merc   280C
#> Merc 450SE  16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3 Merc  450SE
#> Merc 450SL  17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3 Merc  450SL
#> Merc 450SLC 15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3 Merc 450SLC

Now that the data have been cleaned to our specification, we can create the desired scatter plot.

pltData %>%
  ggplot(aes(x = cyl,y = mpg)) +
  geom_point() +
  ggrepel::geom_text_repel(aes(label = model)) +
  theme_bw() +
  labs(x = "# of cylinders",
       y = "Miles per gallon",
       title = "MPG vs. # of cylinders",
       subtitle = "Mercedes models")

To preview the vignette, you can click the Knit button in the toolbar above the .Rmd editing window. The result will be an .html document containing text, R code, and the R code’s output.

Updating package website

We’re happy with the contents of the above vignette and want to display the vignette on the package website. The pkgdown package contains various tools to create a package website. Run the following in your Console (assuming your working directory is still the project’s root directory) to build the package website. A local version of the website should automatically open in your default browser once it is done building. You will then need to push this updated local version to GitHub.

pkgdown::build_site()

Miscellaneous

The rest of the vignette is for individuals who are tasked with maintaining the package. It discusses into what goes on “behind-the-scenes” to generate the pkgdown site and other package idiosyncrasies.

Data Documentation

The method by which data are documented in the dspgWork package is different than “normal” R packages. Typically, for any package that contains loaded data (e.g., for function demonstrations), there is a single folder called data that houses the .rda or .RData files. Associated with this folder is normally a script called data.R. This is the file structure oftentimes assumed by R packages that aid in the development of other R packages.

To maintain a separation between “raw” and “processed” data, we have split the saved data into data_raw and data_clean folders. This unorthodox file structure requires a few tricks to be pulled to, for example, build data documentation files from both data_raw and data_clean and have those files display nicely on the pkgdown site. For instance, it is assumed that the each data file name will begin with dataRaw_ or dataClean_ (this is so that the pkgdown site builder organizes the data sets correctly - see below).

pkgdown Site & _pkgdown.yml

Discuss custom changes to _pkgdown.yml