Given below are two data visualizations that violate many data visualization best practices. Improve these visualizations using R and the tips for effective visualizations that we introduced in class. You should produce one visualization per dataset. Your visualization should be accompanied by a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plots and why, and how you addressed them in the visualization you created.
On the due date you will give a brief presentation describing one of your improved visualizations and the reasoning for the choices you made.
Go to the course GitHub organization and locate your homework repo, clone it in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.
Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.
We’ll use the tidyverse package for much of the data wrangling and visualisation and the data lives in the dsbox package. These packages are already installed for you. You can load them by running the following in your Console:
library(tidyverse)
library(dsbox)
The datasets we’ll use are called instructors
and fisheries
from the dsbox package. Since the datasets are distributed with the package, we don’t need to load them separately; they become available to us when we load the package. You can find out more about the datasets by inspecting their documentation, which you can access by running ?instructors
and ?fisheries
in the Console or using the Help menu in RStudio to search for instructors
or fisheries
. You can also find this information here and here.
The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.
Let’s start by loading the data used to create this plot.
<- read_csv("data/instructional-staff.csv") staff
Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.
## # A tibble: 5 x 12
## faculty_type `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005` `2007`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Full-Time T… 29 27.6 25 24.8 21.8 20.3 19.3 17.8 17.2
## 2 Full-Time T… 16.1 11.4 10.2 9.6 8.9 9.2 8.8 8.2 8
## 3 Full-Time N… 10.3 14.1 13.6 13.6 15.2 15.5 15 14.8 14.9
## 4 Part-Time F… 24 30.4 33.1 33.2 35.5 36 37 39.3 40.5
## 5 Graduate St… 20.5 16.5 18.1 18.8 18.7 19 20 19.9 19.5
## # … with 2 more variables: `2009` <dbl>, `2011` <dbl>
In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from wide format to long format.
But before we do so, a thought exercise: How many rows will the long-format data have? It will have a row for each combination of year and faculty type. If there are 5 faculty types and 11 years of data, how many rows will we have?
We do the wide to long conversion using a new function: pivot_longer()
. The animation below show how this function works, as well as its counterpart pivot_wider()
.