10 S3
You may have noticed that your slot machine results do not look the way I promised they would. I suggested that the slot machine would display its results like this:
play()
## 0 0 DD
## $0
But the current machine displays its results in a less pretty format:
play()
## "0" "0" "DD"
## 0
Moreover, the slot machine uses a hack to display symbols (we call print
from within play
). As a result, the symbols do not follow your prize output if you save it:
one_play <- play()
## "B" "0" "B"
one_play
## 0
You can fix both of these problems with R’s S3 system.
10.1 The S3 System
S3 refers to a class system built into R. The system governs how R handles objects of different classes. Certain R functions will look up an object’s S3 class, and then behave differently in response.
The print
function is like this. When you print a numeric vector, print
will display a number:
num <- 1000000000
print(num)
## 1000000000
But if you give that number the S3 class POSIXct
followed by POSIXt
, print
will display a time:
class(num) <- c("POSIXct", "POSIXt")
print(num)
## "2001-09-08 19:46:40 CST"
If you use objects with classes—and you do—you will run into R’s S3 system. S3 behavior can seem odd at first, but is easy to predict once you are familiar with it.
R’s S3 system is built around three components: attributes (especially the class
attribute), generic functions, and methods.
10.2 Attributes
In Attributes, you learned that many R objects come with attributes, pieces of extra information that are given a name and appended to the object. Attributes do not affect the values of the object, but stick to the object as a type of metadata that R can use to handle the object. For example, a data frame stores its row and column names as attributes. Data frames also store their class, "data.frame"
, as an attribute.
You can see an object’s attributes with attribute
. If you run attribute
on the deck
data frame that you created in Project 2: Playing Cards, you will see:
attributes(deck)
## $names
## [1] "face" "suit" "value"
##
## $class
## [1] "data.frame"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## [20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
R comes with many helper functions that let you set and access the most common attributes used in R. You’ve already met the names
, dim
, and class
functions, which each work with an eponymously named attribute. However, R also has row.names
, levels
, and many other attribute-based helper functions. You can use any of these functions to retrieve an attribute’s value:
row.names(deck)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
## [14] "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
## [27] "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
## [40] "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51" "52"
or to change an attribute’s value:
row.names(deck) <- 101:152
or to give an object a new attribute altogether:
levels(deck) <- c("level 1", "level 2", "level 3")
attributes(deck)
## $names
## [1] "face" "suit" "value"
##
## $class
## [1] "data.frame"
##
## $row.names
## [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
## [18] 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
## [35] 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
## [52] 152
##
## $levels
## [1] "level 1" "level 2" "level 3"
R is very laissez faire when it comes to attributes. It will let you add any attributes that you like to an object (and then it will usually ignore them). The only time R will complain is when a function needs to find an attribute and it is not there.
You can add any general attribute to an object with attr
; you can also use attr
to look up the value of any attribute of an object. Let’s see how this works with one_play
, the result of playing our slot machine one time:
one_play <- play()
one_play
## 0
attributes(one_play)
## NULL
attr
takes two arguments: an R object and the name of an attribute (as a character string). To give the R object an attribute of the specified name, save a value to the output of attr
. Let’s give one_play
an attribute named symbols
that contains a vector of character strings:
attr(one_play, "symbols") <- c("B", "0", "B")
attributes(one_play)
## $symbols
## [1] "B" "0" "B"
To look up the value of any attribute, give attr
an R object and the name of the attribute you would like to look up:
attr(one_play, "symbols")
## "B" "0" "B"
If you give an attribute to an atomic vector, like one_play
, R will usually display the attribute beneath the vector’s values. However, if the attribute changes the vector’s class, R may display all of the information in the vector in a new way (as we saw with POSIXct
objects):
one_play
## [1] 0
## attr(,"symbols")
## [1] "B" "0" "B"
R will generally ignore an object’s attributes unless you give them a name that an R function looks for, like names
or class
. For example, R will ignore the symbols
attribute of one_play
as you manipulate one_play
:
one_play + 1
## 1
## attr(,"symbols")
## "B" "0" "B"
play
to return a prize that contains the symbols associated with it as an attribute named symbols
. Remove the redundant call to print(symbols)
:
play <- function() {
symbols <- get_symbols()
print(symbols)
score(symbols)
}
play
by capturing the output of score(symbols)
and assigning an attribute to it. play
can then return the enhanced version of the output:
play <- function() {
symbols <- get_symbols()
prize <- score(symbols)
attr(prize, "symbols") <- symbols
prize
}
Now play
returns both the prize and the symbols associated with the prize. The results may not look pretty, but the symbols stick with the prize when we copy it to a new object. We can work on tidying up the display in a minute:
play()
## [1] 0
## attr(,"symbols")
## [1] "B" "BB" "0"
two_play <- play()
two_play
## [1] 0
## attr(,"symbols")
## [1] "0" "B" "0"
You can also generate a prize and set its attributes in one step with the structure
function. structure
creates an object with a set of attributes. The first argument of structure
should be an R object or set of values, and the remaining arguments should be named attributes for structure
to add to the object. You can give these arguments any argument names you like. structure
will add the attributes to the object under the names that you provide as argument names:
play <- function() {
symbols <- get_symbols()
structure(score(symbols), symbols = symbols)
}
three_play <- play()
three_play
## 0
## attr(,"symbols")
## "0" "BB" "B"
Now that your play
output contains a symbols
attribute, what can you do with it? You can write your own functions that lookup and use the attribute. For example, the following function will look up one_play
’s symbols
attribute and use it to display one_play
in a pretty manner. We will use this function to display our slot results, so let’s take a moment to study what it does:
slot_display <- function(prize){
# extract symbols
symbols <- attr(prize, "symbols")
# collapse symbols into single string
symbols <- paste(symbols, collapse = " ")
# combine symbol with prize as a character string
# \n is special escape sequence for a new line (i.e. return or enter)
string <- paste(symbols, prize, sep = "\n$")
# display character string in console without quotes
cat(string)
}
slot_display(one_play)
## B 0 B
## $0
The function expects an object like one_play
that has both a numerical value and a symbols
attribute. The first line of the function will look up the value of the symbols
attribute and save it as an object named symbols
. Let’s make an example symbols
object so we can see what the rest of the function does. We can use one_play
’s symbols
attribute to do the job. symbols
will be a vector of three-character strings:
symbols <- attr(one_play, "symbols")
symbols
## "B" "0" "B"
Next, slot_display
uses paste
to collapse the three strings in symbols
into a single-character string. paste
collapses a vector of character strings into a single string when you give it the collapse
argument. paste
will use the value of collapse
to separate the formerly distinct strings. Hence, symbols
becomes B 0 B
the three strings separated by a space:
symbols <- paste(symbols, collapse = " ")
symbols
## "B 0 B"
Our function then uses paste
in a new way to combine symbols
with the value of prize
. paste
combines separate objects into a character string when you give it a sep
argument. For example, here paste
will combine the string in symbols
, B 0 B
, with the number in prize
, 0. paste
will use the value of sep
argument to separate the inputs in the new string. Here, that value is \n$
, so our result will look like "B 0 B\n$0"
:
prize <- one_play
string <- paste(symbols, prize, sep = "\n$")
string
## "B 0 B\n$0"
The last line of slot_display
calls cat
on the new string. cat
is like print
; it displays its input at the command line. However, cat
does not surround its output with quotation marks. cat
also replaces every \n
with a new line or line break. The result is what we see. Notice that it looks just how I suggested that our play
output should look in Programs:
cat(string)
## B 0 B
## $0
You can use slot_display
to manually clean up the output of play
:
slot_display(play())
## C B 0
## $2
slot_display(play())
## 7 0 BB
## $0
This method of cleaning the output requires you to manually intervene in your R session (to call slot_display
). There is a function that you can use to automatically clean up the output of play
each time it is displayed. This function is print
, and it is a generic function.
10.3 Generic Functions
R uses print
more often than you may think; R calls print
each time it displays a result in your console window. This call happens in the background, so you do not notice it; but the call explains how output makes it to the console window (recall that print
always prints its argument in the console window). This print
call also explains why the output of print
always matches what you see when you display an object at the command line:
print(pi)
## 3.141593
pi
## 3.141593
print(head(deck))
## face suit value
## king spades 13
## queen spades 12
## jack spades 11
## ten spades 10
## nine spades 9
## eight spades 8
head(deck)
## face suit value
## king spades 13
## queen spades 12
## jack spades 11
## ten spades 10
## nine spades 9
## eight spades 8
print(play())
## 5
## attr(,"symbols")
## "B" "BB" "B"
play()
## 5
## attr(,"symbols")
## "B" "BB" "B"
You can change how R displays your slot output by rewriting print
to look like slot_display
. Then R would print the output in our tidy format. However, this method would have negative side effects. You do not want R to call slot_display
when it prints a data frame, a numerical vector, or any other object.
Fortunately, print
is not a normal function; it is a generic function. This means that print
is written in a way that lets it do different things in different cases. You’ve already seen this behavior in action (although you may not have realized it). print
did one thing when we looked at the unclassed version of num
:
num <- 1000000000
print(num)
## 1000000000
and a different thing when we gave num
a class:
class(num) <- c("POSIXct", "POSIXt")
print(num)
## "2001-09-08 19:46:40 CST"
Take a look at the code inside print
to see how it does this. You may imagine that print looks up the class attribute of its input and then uses an +if+ tree to pick which output to display. If this occurred to you, great job! print
does something very similar, but much more simple.
10.4 Methods
When you call print
, print
calls a special function, UseMethod
:
print
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x7ffee4c62f80>
## <environment: namespace:base>
UseMethod
examines the class of the input that you provide for the first argument of print
, and then passes all of your arguments to a new function designed to handle that class of input. For example, when you give print
a POSIXct object, UseMethod
will pass all of print
’s arguments to print.POSIXct
. R will then run print.POSIXct
and return the results:
print.POSIXct
## function (x, ...)
## {
## max.print <- getOption("max.print", 9999L)
## if (max.print < length(x)) {
## print(format(x[seq_len(max.print)], usetz = TRUE), ...)
## cat(" [ reached getOption(\"max.print\") -- omitted",
## length(x) - max.print, "entries ]\n")
## }
## else print(format(x, usetz = TRUE), ...)
## invisible(x)
## }
## <bytecode: 0x7fa948f3d008>
## <environment: namespace:base>
If you give print
a factor object, UseMethod
will pass all of print
’s arguments to print.factor
. R will then run print.factor
and return the results:
print.factor
## function (x, quote = FALSE, max.levels = NULL, width = getOption("width"),
## ...)
## {
## ord <- is.ordered(x)
## if (length(x) == 0L)
## cat(if (ord)
## "ordered"
## ...
## drop <- n > maxl
## cat(if (drop)
## paste(format(n), ""), T0, paste(if (drop)
## c(lev[1L:max(1, maxl - 1)], "...", if (maxl > 1) lev[n])
## else lev, collapse = colsep), "\n", sep = "")
## }
## invisible(x)
## }
## <bytecode: 0x7fa94a64d470>
## <environment: namespace:base>
print.POSIXct
and print.factor
are called methods of print
. By themselves, print.POSIXct
and print.factor
work like regular R functions. However, each was written specifically so UseMethod
could call it to handle a specific class of print
input.
Notice that print.POSIXct
and print.factor
do two different things (also notice that I abridged the middle of print.factor
—it is a long function). This is how print
manages to do different things in different cases. print
calls UseMethod
, which calls a specialized method based on the class of print
’s first argument.
You can see which methods exist for a generic function by calling methods
on the function. For example, print
has almost 200 methods (which gives you an idea of how many classes exist in R):
methods(print)
## [1] print.acf*
## [2] print.anova
## [3] print.aov*
## ...
## [176] print.xgettext*
## [177] print.xngettext*
## [178] print.xtabs*
##
## Nonvisible functions are asterisked
This system of generic functions, methods, and class-based dispatch is known as S3 because it originated in the third version of S, the programming language that would evolve into S-PLUS and R. Many common R functions are S3 generics that work with a set of class methods. For example, summary
and head
also call UseMethod
. More basic functions, like c
, +
, -
, <
and others also behave like generic functions, although they call .primitive
instead of UseMethod
.
The S3 system allows R functions to behave in different ways for different classes. You can use S3 to format your slot output. First, give your output its own class. Then write a print method for that class. To do this efficiently, you will need to know a little about how UseMethod
selects a method function to use.
10.4.1 Method Dispatch
UseMethod
uses a very simple system to match methods to functions.
Every S3 method has a two-part name. The first part of the name will refer to the function that the method works with. The second part will refer to the class. These two parts will be separated by a period. So for example, the print method that works with functions will be called print.function
. The summary method that works with matrices will be called summary.matrix
. And so on.
When UseMethod
needs to call a method, it searches for an R function with the correct S3-style name. The function does not have to be special in any way; it just needs to have the correct name.
You can participate in this system by writing your own function and giving it a valid S3-style name. For example, let’s give one_play
a class of its own. It doesn’t matter what you call the class; R will store any character string in the class attribute:
class(one_play) <- "slots"
Now let’s write an S3 print method for the +slots+ class. The method doesn’t need to do anything special—it doesn’t even need to print one_play
. But it does need to be named print.slots
; otherwise UseMethod
will not find it. The method should also take the same arguments as print
; otherwise, R will give an error when it tries to pass the arguments to print.slots
:
args(print)
## function (x, ...)
## NULL
print.slots <- function(x, ...) {
cat("I'm using the print.slots method")
}
Does our method work? Yes, and not only that; R uses the print method to display the contents of one_play
. This method isn’t very useful, so I’m going to remove it. You’ll have a chance to write a better one in a minute:
print(one_play)
## I'm using the print.slots method
one_play
## I'm using the print.slots method
rm(print.slots)
Some R objects have multiple classes. For example, the output of Sys.time
has two classes. Which class will UseMethod
use to find a print method?
now <- Sys.time()
attributes(now)
## $class
## [1] "POSIXct" "POSIXt"
UseMethod
will first look for a method that matches the first class listed in the object’s class vector. If UseMethod
cannot find one, it will then look for the method that matches the second class (and so on if there are more classes in an object’s class vector).
If you give print
an object whose class or classes do not have a print method, UseMethod
will call print.default
, a special method written to handle general cases.
Let’s use this system to write a better print method for the slot machine output.
Exercise 10.2 (Make a Print Method) Write a new print method for the slots class. The method should call slot_display
to return well-formatted slot-machine output.
print.slots
method because we’ve already done all of the hard work when we wrote slot_display
. For example, the following method will work. Just make sure the method is named print.slots
so UseMethod
can find it, and make sure that it takes the same arguments as print
so UseMethod
can pass those arguments to print.slots
without any trouble:
print.slots <- function(x, ...) {
slot_display(x)
}
Now R will automatically use slot_display
to display objects of class +slots+ (and only objects of class “slots”):
one_play
## B 0 B
## $0
Let’s ensure that every piece of slot machine output has the slots
class.
play
function so it assigns slots
to the class
attribute of its output:
play <- function() {
symbols <- get_symbols()
structure(score(symbols), symbols = symbols)
}
class
attribute of the output at the same time that you set the +symbols+ attribute. Just add class = "slots"
to the structure
call:
play <- function() {
symbols <- get_symbols()
structure(score(symbols), symbols = symbols, class = "slots")
}
Now each of our slot machine plays will have the class slots
:
class(play())
## "slots"
As a result, R will display them in the correct slot-machine format:
play()
## BB BB BBB
## $5
play()
## BB 0 0
## $0
10.5 Classes
You can use the S3 system to make a robust new class of objects in R. Then R will treat objects of your class in a consistent, sensible manner. To make a class:
- Choose a name for your class.
- Assign each instance of your class a +class+ attribute.
- Write class methods for any generic function likely to use objects of your class.
Many R packages are based on classes that have been built in a similar manner. While this work is simple, it may not be easy. For example, consider how many methods exist for predefined classes.
You can call methods
on a class with the class
argument, which takes a character string. methods
will return every method written for the class. Notice that methods
will not be able to show you methods that come in an unloaded R package:
methods(class = "factor")
## [1] [.factor [[.factor
## [3] [[<-.factor [<-.factor
## [5] all.equal.factor as.character.factor
## [7] as.data.frame.factor as.Date.factor
## [9] as.list.factor as.logical.factor
## [11] as.POSIXlt.factor as.vector.factor
## [13] droplevels.factor format.factor
## [15] is.na<-.factor length<-.factor
## [17] levels<-.factor Math.factor
## [19] Ops.factor plot.factor*
## [21] print.factor relevel.factor*
## [23] relist.factor* rep.factor
## [25] summary.factor Summary.factor
## [27] xtfrm.factor
##
## Nonvisible functions are asterisked
This output indicates how much work is required to create a robust, well-behaved class. You will usually need to write a class
method for every basic R operation.
Consider two challenges that you will face right away. First, R drops attributes (like class
) when it combines objects into a vector:
play1 <- play()
play1
## B BBB BBB
## $5
play2 <- play()
play2
## 0 B 0
## $0
c(play1, play2)
## [1] 5 0
Here, R stops using print.slots
to display the vector because the vector c(play1, play2)
no longer has a “slots” +class+ attribute.
Next, R will drop the attributes of an object (like class
) when you subset the object:
play1[1]
## [1] 5
You can avoid this behavior by writing a c.slots
method and a [.slots
method, but then difficulties will quickly accrue. How would you combine the symbols
attributes of multiple plays into a vector of symbols attributes? How would you change print.slots
to handle vectors of outputs? These challenges are open for you to explore. However, you will usually not have to attempt this type of large-scale programming as a data scientist.
In our case, it is very handy to let slots
objects revert to single prize values when we combine groups of them together into a vector.
10.6 S3 and Debugging
S3 can be annoying if you are trying to understand R functions. It is difficult to tell what a function does if its code body contains a call to UseMethod
. Now that you know that UseMethod
calls a class-specific method, you can search for and examine the method directly. It will be a function whose name follows the <function.class>
syntax, or possibly <function.default>
. You can also use the methods
function to see what methods are associated with a function or a class.
10.7 S4 and R5
R also contains two other systems that create class specific behavior. These are known as S4 and R5 (or reference classes). Each of these systems is much harder to use than S3, and perhaps as a consequence, more rare. However, they offer safeguards that S3 does not. If you’d like to learn more about these systems, including how to write and use your own generic functions, I recommend the book Advanced R Programming by Hadley Wickham.
10.8 Summary
Values are not the only place to store information in R, and functions are not the only way to create unique behavior. You can also do both of these things with R’s S3 system. The S3 system provides a simple way to create object-specific behavior in R. In other words, it is R’s version of object-oriented programming (OOP). The system is implemented by generic functions. These functions examine the class attribute of their input and call a class-specific method to generate output. Many S3 methods will look for and use additional information that is stored in an object’s attributes. Many common R functions are S3 generics.
R’s S3 system is more helpful for the tasks of computer science than the tasks of data science, but understanding S3 can help you troubleshoot your work in R as a data scientist.
You now know quite a bit about how to write R code that performs custom tasks, but how could you repeat these tasks? As a data scientist, you will often repeat tasks, sometimes thousands or even millions of times. Why? Because repetition lets you simulate results and estimate probabilities. Loops will show you how to automate repetition with R’s for
and while
functions. You’ll use for
to simulate various slot machine plays and to calculate the payout rate of your slot machine.