HW 05 - Money in US politics

Photo by Sharon McCutcheon on Unsplash Photo by Sharon McCutcheon on Unsplash

Every election cycle brings its own brand of excitement – and lots of money. Political donations are of particular interest to political scientists and other researchers studying politics and voting patterns. They are also of interest to citizens who want to stay informed of how much money their candidates raise and where that money comes from.

In the United States, “only American citizens (and immigrants with green cards) can contribute to federal politics, but the American divisions of foreign companies can form political action committees (PACs) and collect contributions from their American employees.”1 Source: Open Secrets - Foreign Connected PACs.

In this assignment we will scrape and work with data foreign connected PACs that donate to US political campaigns. First, we will get data foreign connected PAC contributions in the 2020 election cycle. Then, you will use a similar approach to get data such contributions from previous years so that we can examine trends over time.

In order to complete this assignment you will need a Chrome browser with the Selector Gadget extension installed.

Packages

In this assignment we will work with the following packaes. They should already be installed in your project, and you can load them with the following:

library(tidyverse)
library(robotstxt)
library(rvest)
library(scales)

Data collection via web scraping

The data come from OpenSecrets.org, a “website tracking the influence of money on U.S. politics, and how that money affects policy and citizens’ lives”. This website is hosted by The Center for Responsive Politics, which is a nonpartisan, independent nonprofit that “tracks money in U.S. politics and its effect on elections and public policy.”2 Source: Open Secrets - About.

Before getting started, let’s check that a bot has permissions to access pages on this domain.

paths_allowed("https://www.opensecrets.org")
## [1] TRUE

2020 Foreign-connected PAC contributions

The goal of this exercise is scrape the data from a page that looks like the the page shown above, and save it as a data frame that looks like the data frame shown below.

Since the data are already formatted as a table, we can use the html_table() function to extract it out of the page. Note that this function has some useful arguments like header (to indicate whether the first row of the table should be used as header) and fill (to indicate whether rows with fewer than the maximum number of columns shuld be filled with NA).

Complete the following set of steps in the 01-scrape-pac-2020.R file in the scripts folder of your repository. This file already contains some starter code to help you out.

Hint: Take a look at the help for the rename() function to determine whether these new variable names need to be quoted or not.

Hint: You already know what these numbers should be!

  1. In your R Markdown document, load pac-2020.csv and report its number of observations and variables using inline code.

Functionalize!

You can probably guess where we’re headed: we’ll ultimately scrape data for contributions in all election years Open Secrets has data for. Since that means repeating a task many times, let’s first write a function that works on the first page. Confirm it works on a few others. Then iterate it over pages for all years.

Complete the following set of steps in the 02-scrape-pac-function.R file in the scripts folder of your repository. This file already contains some starter code to help you out.

  1. In your R Markdown file, load these three data frames and report each of their numbers of observations and variables using inline code.

Foreign-connected PAC contributions for all years

Our final task in data scraping is to map the scrape_pac() function over a list of all URLs of web pages containing information on foreign-connected PAC contributions for each year.

Go back to the URLs you defined in the previous exercise, what pattern emerges? They each have the following form:

url_2020 <- "...cycle=2020"
url_2018 <- "...cycle=2018"
url_1998 <- "...cycle=1998"

Complete the following set of steps in the 03-scrape-pac-all.R file in the scripts folder of your repository. This file already contains some starter code to help you out.

  1. In your R Markdown file, load pac-all.csv and report its number of observations and variables using inline code.

✅ ⬆️ If you haven’t yet done so, now is definitely a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Data scraping complete”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Data cleaning

In this section we clean the pac_all data frame to prepare it for analysis and visualization. We have two goals in data cleaning:

Exercises 4 and 5 walk you through how to make these fixes to the data.

  1. Use the separate() function to separate country_parent into country and parent columns. Note that country and parent company names are separated by \ (which will need to be specified in your function) and also note that there are some entries where the \ sign appears twice and in these cases we want to only split the value at the first occurrence of \. This can be accomplished by setting the extra argument in to "merge" so that the cell is split into only 2 segments, e.g. we want "Denmark/Novo Nordisk A/S" to be split into "Denmark" and "Novo Nordisk A/S". (See help for separate() for more on this.)

  2. Remove the character strings including $ and , signs in the total, dems, and repubs columns and convert these columns to numeric. Few hints to help you out:

Data visualization

  1. Create a line plot of total contributions from all foreign-connected PACs in the UK and Canada over the years. Once you have made the plot, write a brief interpretation of what the graph reveals. Make sure to comment on the dip at 2020. Few hints to help you out:

Next, we will walk you through creating the following visualization for contributions from UK-connected PACs to Democratic and Republican parties.