HW 02 - Bike crashes

Photo by Andhika Soreng on Unsplash Photo by Andhika Soreng on Unsplash

Biking is the environmentally friendly way to commute, and a fun activity for kids. But bike crashes are no joke!

The data for this assignment comes from Durham Open Data. The data contains bike crashes between 2007 and 2014.

Learning goals

The goal of this assignment is to keep you practicing your data visualization skills while also adding on tasks for data manipulation like filtering and transforming.

Getting started

IMPORTANT: If there is no GitHub repo created for you for this assignment, it means I didn’t have your GitHub username as of when I assigned the homework. Please let me know your GitHub username asap, and I can create your repo.

Go to the course GitHub organization and locate your HW 2 repo, which should be named hw-02-bike-crash-YOUR_GITHUB_USERNAME. Grab the URL of the repo, and clone it in RStudio. Refer to Lab 01 if you would like to see step-by-step instructions for cloning a repo into an RStudio project.

First, open the R Markdown document hw-02-bike-crash.Rmd and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md file with the same name.

Packages

We’ll use the tidyverse package for the analysis, as usual and the data lives in the dsbox package. These packages are already installed for you, so you load it as usual by running the following in your Console:

library(tidyverse)
library(dsbox)

Data

The data is called ncbikecrash. You can find descriptions of each of the variables in the help file for the dataset, which you can access by running ?ncbikecrash in your Console.

  1. Run View(ncbikecrash) in your Console to view the data in the data viewer. What does each row in the dataset represent?

Hint: The Markdown Quick Reference sheet has an example of inline R code that might be helpful. You can access it from the Help menu in RStudio. Last week you used nrow() to find the number of rows. Use ncol() for the number of columns.

  1. How many bike crashes were recorded in NC between 2007 and 2014? How many variables are recorded on these crashes? Use inline R code when answering this question.

✅ ⬆️ Now is a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Dimensions of data”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. How many bike crashes occurred in residential development areas where the driver was between 0 and 19 years old?

✅ ⬆️ This is again a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Filter for residential and young driver”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Hint: See the help for the count() function, specifically the sort argument for reporting the frequency table in descending order of counts, i.e. highest on top.

  1. Create a frequency table of the estimated speed of the car (driver_est_speed) involved in the crash. What is the most common estimated speed range in the dataset?

Don’t forget to label your R chunk as well (where it says label-me-1). Your label should be short, informative, and shouldn’t include spaces. It also shouldn’t repeat a previous label, otherwise R Markdown will give you an error about repeated R chunk labels.

✅ ⬆️ Commit and push your changes again with an appropriate commit message (e.g. “Most common estimated speed”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. Recreate the following plot, and describe in context of the data what it shows.

Don’t forget to label your R chunk as well (where it says label-me-2). Your label should be short, informative, shouldn’t include spaces, and shouldn’t repeat a previous label.

Play around with the fig.height and fig.width options in the R chunk definitions until you’re satisfied with the dimensions of the figure.

Hint: To match the colors, you can use scale_fill_viridis_d().

✅ ⬆️ This is a another good place to pause, commit changes with the commit message like “Recreated crash severity and alcohol figure”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Hint: Instead of changing the legend, change how the data are represented in the crash_severity variable with mutate().

  1. Recreate the same figure, but this time change the labels of the crash severity variable such that text like A:, B:, etc. doesn’t show up.

For this question you’ll need to add an R chunk, label it, and define preferences for the figure’s height and width.

Not sure how to use emojis on your computer? Maybe a classmate can help? Or you can ask on Piazza or student hours!

✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Recreated figure with cleaner labels, done with HW 2! 💪”, and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.