HW 08 - Wrap up!

Photo by Kari Shea on Unsplash Photo by Kari Shea on Unsplash

It’s almost time to wrap up the course! In this three part assignment you get to practice what we learned this week, try something new, and get creative!

Getting started

By now you should be familiar with instructions for getting started with a new assignment in RStudio Cloud and setting up your git configuration. If not, you can refer to one of the earlier assignments.

Part 1 - Bootstrapping the GSS

In this part we continue our exploration of the 2016 GSS dataset from last week. Remember that this dataset can be found in the dsbox package, and is called gss. Also remember that the GSS asked respondents how many hours and minutes they spend on email weekly. The responses to these questions are recorded in the emailhr and emailmin variables. For example, if the response is 2.5 hrs, this would be recorded as emailhr = 2 and emailmin = 30.

Yes, this exercise is a repeat of what you did last week!

  1. Create a new variable called email that combines these two variables to reports the number of minutes the respondents spend on email weekly.

  2. Filter the data for only those who have non NA entries for email. Do not overwrite the data frame (you’ll need the full data later). Instead save the resulting data frame with a new name.

  3. Describe how bootstrapping can be used to estimate the mean amount of time all Americans spend on email weekly.

In the following questions you will use the infer package to construct intervals rather than writing for loops.

  1. Calculate a 95% bootstrap confidence interval for the mean amount of time Americans spend on email weekly. Interpret this interval in context of the data, reporting its endpoints in “humanized” units (e.g. instead of 108 minutes, report 1 hr and 8 minutes). If you get a result that seems a bit odd, discuss why you think this might be the case.

  2. Would you expect a 99% confidence interval to be wider or narrower than the interval you calculated above? Explain your reasoning.

  3. Using the bootstrap distribution from the previous Exercise 4, calculate a 99% bootstrap confidence interval for the mean amount of time Americans spend on email weekly. Once again, use humanized units.

  4. And finally, construct and interpret a 90% confidence interval for the median amount of time Americans spend on email weekly. Once again, use humanized units.

  5. What does the “90%” mean in your interpretation of the above interval?

Part 2 - You gotta pick a package or two

But really, one is enough. Pick a package from the list below, and use it to do something. If you want to use a package not on this list, that’s also ok, but run it by me first by posting a question about it on Pizza (so that I can confirm it’s not one we introduced in the class so far, the goal is to work with a new package).

Remember, you install the package in the Console, not in your R Markdown document since you don’t want to keep reinstalling it every time you knit the document.

Your task is to install the package you pick. Depending on where the package comes from, how you install the package differs: - If the package is on CRAN (Comprehensive R Archive Network), you can install it with install.packages. - If the package is only on Github (most likely because it is still under development), you need to use the install_github function. See above for details.

Then, load the package. Regardless of how you installed the package you can load it with the library function.

Finally, do something with the package. It doesn’t have to be complicated. In fact, keep it simple. The goal is to try to read and understand the package documentation to be able to carry out a simple task.

  1. Which package are you using? State the name of the package, whether it was on CRAN or GitHub, and include the code for loading it.

  2. What are you doing with the package? Give me a brief narrative including code and output.

Packages on CRAN

These packages can be installed with:

install.packages("PACKAGENAME")

The package manuals are linked below, however developers of the packages might have additional information on the GitHub repo of the package.

Packages on GitHub only

These packages can be installed with:

library(devtools)
install_github("USERNAME/PACKAGENAME")

USERNAME refers to the user name of the developer of the package. For example, for the first package listed below, USERNAME is hadley and PACKAGENAME is emo.

The package manuals are linked below, however developers of the packages might have additional information on the GitHub repo of the package.

Part 3 - Mirror, mirror on the wall, who’s the ugliest of them all?

Here is a simple plot using the mpg dataset, which contains information on fuel economy of cars. We’re plotting highway miles per gallon vs. city miles per gallon, colored by whether the car is front-wheel drive, rear wheel drive, or four-wheel drive.

ggplot(data = mpg, aes(x = cty, y = hwy, color = drv)) +
  geom_point()

I realize that “ugly” is subjective, so we’re mostly looking to see if you can figure out how to change the look of a plot using help files of functions you haven’t learned before.

  1. Make this plot as ugly as possible by changing colors, background color, fonts, or anything else you can think of. You will probably want to play around with theme options, but you can do more. You can also search online for other themes. fonts, etc. that you want to tweak.