1 Program
This chapter includes the following recipes. To manipulate vectors with purrr, see [Transform Lists and Vectors].
- Install a tidyverse package
- Install all of the tidyverse packages
- Load a tidyverse package
- Load the core set of tidyverse packages
- Update a tidyverse package
- Update all of the tidyverse packages
- List all of the tidyverse packages
- Create an object
- Create a vector
- See an object
- Remove an object
- Call a function
- See a function’s arguments
- Open a function’s help page
- Combine functions into a pipe
- Pipe a result to a specific argument
What you should know before you begin
The tidyverse is a collection of R packages that are designed to work well together. There are about 25 packages in the tidyverse. An R package is a bundle of functions, documentation, and data sets. R has over 13,000 packages. These are not installed with R, but are archived online for when you need them.
To use an R package, you must:
Install the package on your local machine with
install.packages()
. You only need to do this once per machine.Load the package into your R session with
library()
. You need to do this each time you start a new R session (if you wish to use the package in that session).
%>%
operator. %>%
links R functions together to create a “pipe” of functions that are run in sequence: %>%
passes the output of one function to the input of the next. %>%
comes with the dplyr package, which imports it from the magrittr package.
1.1 Install a tidyverse package
You’d like to install a package that is in the tidyverse.
Solution
Discussion
Tidyverse packages can be installed in the normal way with install.packages()
. See ?install.packages
for installation details.
install.packages()
will download packages from https://cran.r-project.org, or one of its mirrors—so be sure you are connected to the internet when you run it.
1.2 Install all of the tidyverse packages
You’d like to install all of the packages in the tidyverse with a single command.
Solution
Discussion
The tidyverse
package provides a shortcut for downloading all of the packages in the tidyverse. tidyverse
purposefully lists every package in the tidyverse as one of its dependencies. This causes R to install all of the packages in the tidyverse when R installs tidyverse
.
install.packages("tidyverse")
will install the following packages:
## [1] "broom" "cli" "crayon" "dbplyr" "dplyr"
## [6] "forcats" "ggplot2" "haven" "hms" "httr"
## [11] "jsonlite" "lubridate" "magrittr" "modelr" "pillar"
## [16] "purrr" "readr" "readxl" "reprex" "rlang"
## [21] "rstudioapi" "rvest" "stringr" "tibble" "tidyr"
## [26] "xml2" "tidyverse"
1.3 Load a tidyverse package
You want to load a package that is in the tidyverse, so that you can use its contents. You’ve already installed the package on your computer.
Solution
Discussion
You can load individual tidyverse packages with library()
. The package will stay loaded until you end your R session or run detach()
on the package. If you begin a new R session, you will need to reload the package in the new session with library()
.
library()
cannot load packages that have not been installed on your machine.
You must place quotation marks around a package name when you use install.packages()
, but the same is not true for library()
. The commands below will both load the dplyr package if it is installed on your computer.
library(dplyr)
library("dplyr")
1.4 Load the core set of tidyverse packages
You would like to load the most used packages in the tidyverse with a single command. You’ve already installed these packages on your computer.
Solution
Discussion
When you load the tidyverse
package, R will also load the following packages:
ggplot2
dplyr
tidyr
readr
purrr
tibble
stringr
forcats
These eight packages are considered the “core” of the tidyverse because:
- They are the most used tidyverse packages.
- They are often used together as a set (when you use one of the packages, you tend to also use the others).
You can still load each of these packages individually with library()
.
library("tidyverse")
does not load every package installed by install.packages("tidyverse")
. You must use library()
to individually load the “non-core” tidyverse packages.
1.5 Update a tidyverse package
You want to check that you have the latest version of a package that is in the tidyverse.
Solution
Discussion
update.packages()
compares the version number of your local copy of a package to the version number of the newest version available on CRAN. If your local copy is older than the newest version, update.packages()
will download and install the newest version from CRAN. Otherwise, update.packages()
will do nothing.
update.packages()
.
1.6 Update all of the tidyverse packages
You want to check that you have the latest version of every package that is in the tidyverse.
Solution
Discussion
tidyverse_update()
checks whether or not each of your tidyverse packages is up-to-date. If every package is up-to-date, tidyverse_update()
will return the message: All tidyverse packages up-to-date
. Otherwise, tidyverse_update()
will return a piece of code that you can copy and run to selectively update only those packages that are out-of-date.
1.7 List all of the tidyverse packages
You want to generate a vector that contains the names of every package in the tidyverse.
Solution
## [1] "broom" "cli" "crayon" "dbplyr" "dplyr"
## [6] "forcats" "ggplot2" "haven" "hms" "httr"
## [11] "jsonlite" "lubridate" "magrittr" "modelr" "pillar"
## [16] "purrr" "readr" "readxl" "reprex" "rlang"
## [21] "rstudioapi" "rvest" "stringr" "tibble" "tidyr"
## [26] "xml2" "tidyverse"
Discussion
tidyverse_packages()
returns a character vector that contains the names of every package that was in the tidyverse when you installed the tidyverse
package. These are the packages that were installed onto your machine along with the tidyverse
package. They are also the packages that tidyverse_update()
will check.
Update the tidyverse
package before running tidyverse_packages()
to receive the most current list.
1.8 Create an object
You want to create an object that stores content to use later.
Solution
Discussion
The assignment operator is made by a less than sign, <
, followed by a minus sign -
. The result looks like an arrow, <-
. R will assign the contents on the right hand side of the operator to the name on the left hand sign. Afterwards, you can use the name in your code to refer to the contents, e.g. log(x)
.
Names cannot begin with a number, nor a special character like ^
, !
, $
, @
, +
, -
, /
, or *
.
1.9 Create a vector
You want to create a one-dimensional, ordered set of values.
Discussion
The concatenate function, c()
, combines its arguments into a vector that can be passed to a function or assigned to an object. To name each value, provide names to the arguments of c()
, e.g.
## mu chi psi pi
## 3 1 7 5
1.10 See an object
You want to see the contents of an object.
Discussion
R displays the contents of an object when you call the bare object name—not surrounded with quotation marks, not followed by parentheses. This works for functions too, in which case R displays the code elements assigned to the function name,e.g. lm
.
1.11 Remove an object
You want to remove an object that you have created.
Solution
Discussion
The remove function, rm()
, removes an object from R’s memory. rm()
will not remove an object that is loaded from an R package.
1.12 Call a function
You want to call a function, i.e. you want R to execute a function.
Discussion
To run a function, type its name followed by an open open and closed parentheses. If the function requires arguments (i.e. inputs) to do its job, place the arguments between the parentheses.
It is a best practice to name every argument after the first. This is sometimes relaxed to every argument after the second argument, if the second argument is also obvious and required.
1.13 See a function’s arguments
You want to see which arguments a function uses/recognizes.
Discussion
args()
displays the argument names of a function. Ignore the function
, (
, )
, and NULL
in the output. Each argument name will be listed between the parentheses.
If an argument has a default value, the value will appear next to its name, as in digits = 0
. Arguments that have default values are optional: the function will use the default value when you do not supply a different value for that argument. Arguments without default values must be supplied when you run the function to avoid an error.
1.14 Open a function’s help page
You want to open the help page for a function.
Solution
Discussion
To open a function’s help page, run a ?
followed by the bare function name—no parentheses or quotes. A function’s help page is the technical documentation for the function and its arguments. Often the most useful section of a help page is the last, which is an examples section of code that uses the function.
1.15 Combine functions into a pipe
You want to chain multiple functions together to be run in sequence, with each function operating on the preceding function’s output.
Solution
starwars %>%
group_by(species) %>%
summarise(avg_height = mean(height, na.rm = TRUE)) %>%
arrange(avg_height)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 38 x 2
## species avg_height
## <chr> <dbl>
## 1 Yoda's species 66
## 2 Aleena 79
## 3 Ewok 88
## 4 Vulptereen 94
## 5 Dug 112
## 6 Xexto 122
## 7 Droid 131.
## 8 Toydarian 137
## 9 Sullustan 160
## 10 Toong 163
## # … with 28 more rows
Discussion
The %>%
operator (pronounced “pipe operator”) evaluates the code on its left hand side (LHS) and then passes the result to the the code on its right hand side (RHS), which should be a function call. By default %>%
will pass the result of the LHS to the first unnamed argument of the function on the RHS.
So starwars %>% group_by(species)
is the equivalent of group_by(starwars, species)
, and the above solution is the equivalent of the nested code:
arrange(
summarise(
group_by(starwars, species),
avg_height = mean(height, na.rm = TRUE)
),
avg_height
)
or the equivalent of:
x1 <- starwars
x2 <- group_by(x1, species)
x3 <- summarise(x3, avg_height = mean(height, na.rm = TRUE))
arrange(x3, avg_height)
The chunk of functions connected by %>%
is called a “pipe.” To read a pipe as a sequence of steps, mentally pronounce %>%
as “then.”
The %>%
operator is loaded with the dplyr package, which imports it from the magrittr package. Tidyverse functions facilitate using %>%
by
- accepting a data frame or tibble as their first argument
- returning a data frame or tibble as their result
%>%
is easy to type in the RStudio IDE with the keyboard shortcuts
- Command + Shift + M (Mac OS)
- Control + Shift + M (Windows)
1.16 Pipe a result to a specific argument
You want to use %>%
to pass the result of the left hand side to an argument that is not the first argument of the function on the right hand side.
Solution
##
## Call:
## lm(formula = mass ~ height, data = .)
##
## Coefficients:
## (Intercept) height
## -13.8103 0.6386
Discussion
By default %>%
passes the result of the left hand side to the the first unnamed argument of the function on the right hand side. To override this default, use .
as a placeholder within the function call on the right hand side. %>%
will evaluate .
as the result of the left hand side, instead of passing the result to the first unnamed argument.
The solution code is the equivalent of