1 Program


This chapter includes the following recipes. To manipulate vectors with purrr, see Transform Lists and Vectors.


What you should know before you begin

The tidyverse is a collection of R packages that are designed to work well together. There are about 25 packages in the tidyverse. An R package is a bundle of functions, documentation, and data sets. R has over 13,000 packages. These are not installed with R, but are archived online for when you need them.

To use an R package, you must:

  1. Install the package on your local machine with install.packages(). You only need to do this once per machine.

  2. Load the package into your R session with library(). You need to do this each time you start a new R session (if you wish to use the package in that session).

You cannot use the contents of a package until you load the package in your current R session. You should update your packages from time to time to receive the latest improvements from package authors.
Tidyverse functions are designed to be used with the %>% operator. %>% links R functions together to create a “pipe” of functions that are run in sequence: %>% passes the output of one function to the input of the next. %>% comes with the dplyr package, which imports it from the magrittr package.

1.1 Install a tidyverse package

You’d like to install a package that is in the tidyverse.

Solution

install.packages("dplyr")

Discussion

Tidyverse packages can be installed in the normal way with install.packages(). See ?install.packages for installation details.

By default, install.packages() will download packages from https://cran.r-project.org, or one of its mirrors—so be sure you are connected to the internet when you run it.

1.2 Install all of the tidyverse packages

You’d like to install all of the packages in the tidyverse with a single command.

Solution

install.packages("tidyverse")

Discussion

The tidyverse package provides a shortcut for downloading all of the packages in the tidyverse. tidyverse purposefully lists every package in the tidyverse as one of its dependencies. This causes R to install all of the packages in the tidyverse when R installs tidyverse.

install.packages("tidyverse") will install the following packages:

##  [1] "broom"       "cli"         "crayon"      "dplyr"       "dbplyr"     
##  [6] "forcats"     "ggplot2"     "haven"       "hms"         "httr"       
## [11] "jsonlite"    "lubridate"   "magrittr"    "modelr"      "purrr"      
## [16] "readr"       "readxl\n(>=" "reprex"      "rlang"       "rstudioapi" 
## [21] "rvest"       "stringr"     "tibble"      "tidyr"       "xml2"       
## [26] "tidyverse"

1.3 Load a tidyverse package

You want to load a package that is in the tidyverse, so that you can use its contents. You’ve already installed the package on your computer.

Solution

library("dplyr")

Discussion

You can load individual tidyverse packages with library(). The package will stay loaded until you end your R session or run detach() on the package. If you begin a new R session, you will need to reload the package in the new session with library().

library() cannot load packages that have not been installed on your machine.

You must place quotation marks around a package name when you use install.packages(), but the same is not true for library(). The commands below will both load the dplyr package if it is installed on your computer.

library(dplyr)
library("dplyr")

1.4 Load the core set of tidyverse packages

You would like to load the most used packages in the tidyverse with a single command. You’ve already installed these packages on your computer.

Solution

library("tidyverse")

Discussion

When you load the tidyverse package, R will also load the following packages:

  • ggplot2
  • dplyr
  • tidyr
  • readr
  • purrr
  • tibble
  • stringr
  • forcats

These eight packages are considered the “core” of the tidyverse because:

  1. They are the most used tidyverse packages.
  2. They are often used together as a set (when you use one of the packages, you tend to also use the others).

You can still load each of these packages individually with library().

Notice that library("tidyverse") does not load every package installed by install.packages("tidyverse"). You must use library() to individually load the “non-core” tidyverse packages.

1.5 Update a tidyverse package

You want to check that you have the latest version of a package that is in the tidyverse.

Solution

update.packages("dplyr")

Discussion

update.packages() compares the version number of your local copy of a package to the version number of the newest version available on CRAN. If your local copy is older than the newest version, update.packages() will download and install the newest version from CRAN. Otherwise, update.packages() will do nothing.

Be sure that you are connected to the internet when you run update.packages().

1.6 Update all of the tidyverse packages

You want to check that you have the latest version of every package that is in the tidyverse.

Solution

tidyverse_update()

Discussion

tidyverse_update() checks whether or not each of your tidyverse packages is up-to-date. If every package is up-to-date, tidyverse_update() will return the message: All tidyverse packages up-to-date. Otherwise, tidyverse_update() will return a piece of code that you can copy and run to selectively update only those packages that are out-of-date.

1.7 List all of the tidyverse packages

You want to generate a vector that contains the names of every package in the tidyverse.

Solution

tidyverse_packages()
##  [1] "broom"       "cli"         "crayon"      "dplyr"       "dbplyr"     
##  [6] "forcats"     "ggplot2"     "haven"       "hms"         "httr"       
## [11] "jsonlite"    "lubridate"   "magrittr"    "modelr"      "purrr"      
## [16] "readr"       "readxl\n(>=" "reprex"      "rlang"       "rstudioapi" 
## [21] "rvest"       "stringr"     "tibble"      "tidyr"       "xml2"       
## [26] "tidyverse"

Discussion

tidyverse_packages() returns a character vector that contains the names of every package that was in the tidyverse when you installed the tidyverse package. These are the packages that were installed onto your machine along with the tidyverse package. They are also the packages that tidyverse_update() will check.

Update the tidyverse package before running tidyverse_packages() to receive the most current list.

1.8 Combine functions into a pipe

You want to chain multiple functions together to be run in sequence, with each function operating on the preceding function’s output.

Solution

starwars %>% 
  group_by(species) %>% 
  summarise(avg_height = mean(height, na.rm = TRUE)) %>% 
  arrange(avg_height)
## # A tibble: 38 x 2
##    species        avg_height
##    <chr>               <dbl>
##  1 Yoda's species         66
##  2 Aleena                 79
##  3 Ewok                   88
##  4 Vulptereen             94
##  5 Dug                   112
##  6 Xexto                 122
##  7 Toydarian             137
##  8 Droid                 140
##  9 Sullustan             160
## 10 <NA>                  160
## # … with 28 more rows

Discussion

The %>% operator (pronounced “pipe operator”) evaluates the code on its left hand side (LHS) and then passes the result to the the code on its right hand side (RHS), which should be a function call. By default %>% will pass the result of the LHS to the first unnamed argument of the function on the RHS.

So starwars %>% group_by(species) is the equivalent of group_by(starwars, species), and the above solution is the equivalent of the nested code:

arrange(
  summarise(
    group_by(starwars, species), 
    avg_height = mean(height, na.rm = TRUE)
  ), 
  avg_height
)

or the equivalent of:

x1 <- starwars 
x2 <- group_by(x1, species)
x3 <- summarise(x3, avg_height = mean(height, na.rm = TRUE))
arrange(x3, avg_height)

The chunk of functions connected by %>% is called a “pipe.” To read a pipe as a sequence of steps, mentally pronounce %>% as “then.”

The %>% operator is loaded with the dplyr package, which imports it from the magrittr package. Tidyverse functions facilitate using %>% by

  1. accepting a data frame or tibble as their first argument
  2. returning a data frame or tibble as their result

%>% is easy to type in the RStudio IDE with the keyboard shortcuts

  • Command + Shift + M (Mac OS)
  • Control + Shift + M (Windows)

1.9 Pipe a result to a specific argument

You want to use %>% to pass the result of the left hand side to an argument that is not the first argument of the function on the right hand side.

Solution

starwars %>% 
  lm(mass ~ height, data = .)
## 
## Call:
## lm(formula = mass ~ height, data = .)
## 
## Coefficients:
## (Intercept)       height  
##    -13.8103       0.6386

Discussion

By default %>% passes the result of the left hand side to the the first unnamed argument of the function on the right hand side. To override this default, use . as a placeholder within the function call on the right hand side. %>% will evaluate . as the result of the left hand side, instead of passing the result to the first unnamed argument.

The solution code is the equivalent of

lm(mass ~ height, data = starwars)