1 Program


This chapter includes the following recipes. To manipulate vectors with purrr, see [Transform Lists and Vectors].


What you should know before you begin

The tidyverse is a collection of R packages that are designed to work well together. There are about 25 packages in the tidyverse. An R package is a bundle of functions, documentation, and data sets. R has over 13,000 packages. These are not installed with R, but are archived online for when you need them.

To use an R package, you must:

  1. Install the package on your local machine with install.packages(). You only need to do this once per machine.

  2. Load the package into your R session with library(). You need to do this each time you start a new R session (if you wish to use the package in that session).

You cannot use the contents of a package until you load the package in your current R session. You should update your packages from time to time to receive the latest improvements from package authors.
Tidyverse functions are designed to be used with the %>% operator. %>% links R functions together to create a “pipe” of functions that are run in sequence: %>% passes the output of one function to the input of the next. %>% comes with the dplyr package, which imports it from the magrittr package.

1.1 Install a tidyverse package

You’d like to install a package that is in the tidyverse.

Discussion

Tidyverse packages can be installed in the normal way with install.packages(). See ?install.packages for installation details.

By default, install.packages() will download packages from https://cran.r-project.org, or one of its mirrors—so be sure you are connected to the internet when you run it.

1.2 Install all of the tidyverse packages

You’d like to install all of the packages in the tidyverse with a single command.

Discussion

The tidyverse package provides a shortcut for downloading all of the packages in the tidyverse. tidyverse purposefully lists every package in the tidyverse as one of its dependencies. This causes R to install all of the packages in the tidyverse when R installs tidyverse.

install.packages("tidyverse") will install the following packages:

##  [1] "broom"      "cli"        "crayon"     "dbplyr"     "dplyr"     
##  [6] "forcats"    "ggplot2"    "haven"      "hms"        "httr"      
## [11] "jsonlite"   "lubridate"  "magrittr"   "modelr"     "pillar"    
## [16] "purrr"      "readr"      "readxl"     "reprex"     "rlang"     
## [21] "rstudioapi" "rvest"      "stringr"    "tibble"     "tidyr"     
## [26] "xml2"       "tidyverse"

1.3 Load a tidyverse package

You want to load a package that is in the tidyverse, so that you can use its contents. You’ve already installed the package on your computer.

Solution

Discussion

You can load individual tidyverse packages with library(). The package will stay loaded until you end your R session or run detach() on the package. If you begin a new R session, you will need to reload the package in the new session with library().

library() cannot load packages that have not been installed on your machine.

You must place quotation marks around a package name when you use install.packages(), but the same is not true for library(). The commands below will both load the dplyr package if it is installed on your computer.

library(dplyr)
library("dplyr")

1.4 Load the core set of tidyverse packages

You would like to load the most used packages in the tidyverse with a single command. You’ve already installed these packages on your computer.

Discussion

When you load the tidyverse package, R will also load the following packages:

  • ggplot2
  • dplyr
  • tidyr
  • readr
  • purrr
  • tibble
  • stringr
  • forcats

These eight packages are considered the “core” of the tidyverse because:

  1. They are the most used tidyverse packages.
  2. They are often used together as a set (when you use one of the packages, you tend to also use the others).

You can still load each of these packages individually with library().

Notice that library("tidyverse") does not load every package installed by install.packages("tidyverse"). You must use library() to individually load the “non-core” tidyverse packages.

1.5 Update a tidyverse package

You want to check that you have the latest version of a package that is in the tidyverse.

Discussion

update.packages() compares the version number of your local copy of a package to the version number of the newest version available on CRAN. If your local copy is older than the newest version, update.packages() will download and install the newest version from CRAN. Otherwise, update.packages() will do nothing.

Be sure that you are connected to the internet when you run update.packages().

1.6 Update all of the tidyverse packages

You want to check that you have the latest version of every package that is in the tidyverse.

Discussion

tidyverse_update() checks whether or not each of your tidyverse packages is up-to-date. If every package is up-to-date, tidyverse_update() will return the message: All tidyverse packages up-to-date. Otherwise, tidyverse_update() will return a piece of code that you can copy and run to selectively update only those packages that are out-of-date.

1.7 List all of the tidyverse packages

You want to generate a vector that contains the names of every package in the tidyverse.

Solution

##  [1] "broom"      "cli"        "crayon"     "dbplyr"     "dplyr"     
##  [6] "forcats"    "ggplot2"    "haven"      "hms"        "httr"      
## [11] "jsonlite"   "lubridate"  "magrittr"   "modelr"     "pillar"    
## [16] "purrr"      "readr"      "readxl"     "reprex"     "rlang"     
## [21] "rstudioapi" "rvest"      "stringr"    "tibble"     "tidyr"     
## [26] "xml2"       "tidyverse"

Discussion

tidyverse_packages() returns a character vector that contains the names of every package that was in the tidyverse when you installed the tidyverse package. These are the packages that were installed onto your machine along with the tidyverse package. They are also the packages that tidyverse_update() will check.

Update the tidyverse package before running tidyverse_packages() to receive the most current list.

1.8 Create an object

You want to create an object that stores content to use later.

Solution

Discussion

The assignment operator is made by a less than sign, <, followed by a minus sign -. The result looks like an arrow, <-. R will assign the contents on the right hand side of the operator to the name on the left hand sign. Afterwards, you can use the name in your code to refer to the contents, e.g. log(x).

Names cannot begin with a number, nor a special character like ^, !, $, @, +, -, /, or *.

If you assign to a name that is already in use, the new content will mask or overwrite the previous object.

1.9 Create a vector

You want to create a one-dimensional, ordered set of values.

Solution

## [1] 3 1 7 5

Discussion

The concatenate function, c(), combines its arguments into a vector that can be passed to a function or assigned to an object. To name each value, provide names to the arguments of c(), e.g.

##  mu chi psi  pi 
##   3   1   7   5

1.10 See an object

You want to see the contents of an object.

Solution

## [1] 1

Discussion

R displays the contents of an object when you call the bare object name—not surrounded with quotation marks, not followed by parentheses. This works for functions too, in which case R displays the code elements assigned to the function name,e.g. lm.

1.11 Remove an object

You want to remove an object that you have created.

Solution

Discussion

The remove function, rm(), removes an object from R’s memory. rm() will not remove an object that is loaded from an R package.

1.12 Call a function

You want to call a function, i.e. you want R to execute a function.

Solution

## [1] 3.14

Discussion

To run a function, type its name followed by an open open and closed parentheses. If the function requires arguments (i.e. inputs) to do its job, place the arguments between the parentheses.

It is a best practice to name every argument after the first. This is sometimes relaxed to every argument after the second argument, if the second argument is also obvious and required.

1.13 See a function’s arguments

You want to see which arguments a function uses/recognizes.

Solution

## function (x, digits = 0) 
## NULL

Discussion

args() displays the argument names of a function. Ignore the function, (, ), and NULL in the output. Each argument name will be listed between the parentheses.

If an argument has a default value, the value will appear next to its name, as in digits = 0. Arguments that have default values are optional: the function will use the default value when you do not supply a different value for that argument. Arguments without default values must be supplied when you run the function to avoid an error.

1.14 Open a function’s help page

You want to open the help page for a function.

Solution

Discussion

To open a function’s help page, run a ? followed by the bare function name—no parentheses or quotes. A function’s help page is the technical documentation for the function and its arguments. Often the most useful section of a help page is the last, which is an examples section of code that uses the function.

A function’s help page is loaded with the function. It will only be available if the package that contains the function is loaded.

1.15 Combine functions into a pipe

You want to chain multiple functions together to be run in sequence, with each function operating on the preceding function’s output.

Solution

## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 38 x 2
##    species        avg_height
##    <chr>               <dbl>
##  1 Yoda's species        66 
##  2 Aleena                79 
##  3 Ewok                  88 
##  4 Vulptereen            94 
##  5 Dug                  112 
##  6 Xexto                122 
##  7 Droid                131.
##  8 Toydarian            137 
##  9 Sullustan            160 
## 10 Toong                163 
## # … with 28 more rows

Discussion

The %>% operator (pronounced “pipe operator”) evaluates the code on its left hand side (LHS) and then passes the result to the the code on its right hand side (RHS), which should be a function call. By default %>% will pass the result of the LHS to the first unnamed argument of the function on the RHS.

So starwars %>% group_by(species) is the equivalent of group_by(starwars, species), and the above solution is the equivalent of the nested code:

or the equivalent of:

The chunk of functions connected by %>% is called a “pipe.” To read a pipe as a sequence of steps, mentally pronounce %>% as “then.”

The %>% operator is loaded with the dplyr package, which imports it from the magrittr package. Tidyverse functions facilitate using %>% by

  1. accepting a data frame or tibble as their first argument
  2. returning a data frame or tibble as their result

%>% is easy to type in the RStudio IDE with the keyboard shortcuts

  • Command + Shift + M (Mac OS)
  • Control + Shift + M (Windows)

1.16 Pipe a result to a specific argument

You want to use %>% to pass the result of the left hand side to an argument that is not the first argument of the function on the right hand side.

Solution

## 
## Call:
## lm(formula = mass ~ height, data = .)
## 
## Coefficients:
## (Intercept)       height  
##    -13.8103       0.6386

Discussion

By default %>% passes the result of the left hand side to the the first unnamed argument of the function on the right hand side. To override this default, use . as a placeholder within the function call on the right hand side. %>% will evaluate . as the result of the left hand side, instead of passing the result to the first unnamed argument.

The solution code is the equivalent of