Photo by Daniel Cheung on Unsplash
This week we’ll do some data gymnastics to refresh and review what we learned over the past few weeks.
In this assignment we will work with the
tidyverse as usual ans the dsbox package for the data.
We have (simulated) data from lego sales in 2018 for a sample of customers who bought legos in the US. The dataset is called
lego_sales. You can find descriptions of each of the variables in the help file for the dataset, which you can access by running
?lego_sales in your Console.
Answer the following questions using pipelines. For each question, state your answer in a sentence, e.g. “The first three common names of purchasers are …”.
What are the three most common first names of purchasers?
What are the three most common themes of lego sets purchased?
Among the most common theme of lego sets purchased, what is the most common subtheme?
Hint: Use the
age_groupand group the ages into the following categories: “18 and under”, “19 - 25”, “26 - 35”, “36 - 50”, “51 and over”.
Hint: You will need to consider quantity of purchases.
Hint: You will need to consider quantity of purchases as well as price of lego sets.
Which age group has spent the most money on legos?
Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.
The next dataset is about instructional staff employee hiring trends between 1975 and 2011. The dataset is called
instructors. You can find descriptions of each of the variables in the help file for the dataset, which you can access by running
?instructors in your Console.
The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.
During the lab you had a chance to discuss with your teammates how you would improve upon this visualization if the main objective was to communicate that the proportion of part-time faculty have gone up over time compared to other instructional staff types.