BRIEF

  1. I have added all relevant R code (with issue of replacing faceted graph with an alternative remaining unresolved by TAs) to make the graph fully reproducible. The process in which the data was wrangled and plotted is also fully comprehensible now. The TAs resolved the beyonce palette issue though by having me upgrade R to version 3.5.0. A valid explanation was also provided for why the colorblindr grid is divided into 4 quadrants taking up real estate. It is because the 4 grids show the effect of selected color key of a graph on 4 different color blindness related maladies.
  2. This is a SCATTER graph which is faceted across two rows.
  3. The DATA was taken from the federal register link here: https://www.federalregister.gov/executive-orders. There are 16 qualitative attributes across 890 executive orders in this dataset.
  4. The audience is general in nature who may be interested in statistics related to policy making by the executive branch.
  5. This is a graph that plots the number of pages in an executive order against a president’s length of stay in office across 4 presedencies.
  6. Users can try to deduce whether the length of executive orders increases or decreases as a President matured in office.
  7. A negative aspect of this graph is that one has to rely on facets to see how length of executive orders changed over time across presidencies. It would have been nice to have figured out a way to display 3 variables on one graph so the inaccuracies inherent to visualizing of data across facets (especially those occupying multiple rows) could have been avoided.
  8. A common variation of this scatter graph using facets is a scatter graph that uses colors for demarcating presidenies in the same grid. However, this wasn’t a good alternative either - as per the last graph in this brief - where the points in the graphs overlap across presidencies thus hiding details for a given president.
  9. The graph required me to first parse the data read as HTML from the link provided by instructors. I then created new columns for length of office and length of executive orders by implementing subtraction across certain columns. I also removed large executive orders greater than 20 pages long as outliers. I then plotted the graph using geom_point and facets first, and then geom_point with color, to test out the variation.
library(tidyverse)
library(forcats)
library(RColorBrewer)
library(viridis)
library(ggthemes)
library(wesanderson)
library(beyonce)
# Challenge 10 Part 1: Read the data
library(rvest)
library(stringi)
library(pluralize) # devtools::install_github("hrbrmstr/pluralize")
library(hrbrthemes)
#' Retrieve the Federal Register main EO page so we can get the links for each POTUS
pg <- read_html("https://www.federalregister.gov/executive-orders")
#' Find the POTUS EO data nodes, excluding the one for "All"
html_nodes(pg, "ul.bulk-files") %>% 
  html_nodes(xpath = ".//li[span[a[contains(@href, 'json')]] and 
                            not(span[contains(., 'All')])]") -> potus_nodes
#' Turn the POTUS info into a data frame with the POTUS name and EO JSON link,
#' then retrieve the JSON file and make a data frame of individual data elements
data_frame(
  potus = html_nodes(potus_nodes, "span:nth-of-type(1)") %>% html_text(),
  eo_link = html_nodes(potus_nodes, "a[href *= 'json']") %>% 
    html_attr("href") %>% 
    sprintf("https://www.federalregister.gov%s", .)
) %>% 
  mutate(eo = map(eo_link, jsonlite::fromJSON)) %>% 
  mutate(eo = map(eo, "results")) %>% 
  unnest() -> eo_df
glimpse(eo_df)
## Observations: 890
## Variables: 16
## $ potus                  <chr> "Donald Trump", "Donald Trump", "Donald...
## $ eo_link                <chr> "https://www.federalregister.gov/docume...
## $ citation               <chr> "82 FR 8351", "82 FR 8657", "82 FR 8793...
## $ document_number        <chr> "2017-01799", "2017-02029", "2017-02095...
## $ end_page               <int> 8352, 8658, 8797, 8803, 8982, 9338, 934...
## $ executive_order_notes  <chr> NA, "See: EO 13807, August 15, 2017", N...
## $ executive_order_number <int> 13765, 13766, 13767, 13768, 13769, 1377...
## $ html_url               <chr> "https://www.federalregister.gov/docume...
## $ pdf_url                <chr> "https://www.gpo.gov/fdsys/pkg/FR-2017-...
## $ publication_date       <chr> "2017-01-24", "2017-01-30", "2017-01-30...
## $ signing_date           <chr> "2017-01-20", "2017-01-24", "2017-01-25...
## $ start_page             <int> 8351, 8657, 8793, 8799, 8977, 9333, 933...
## $ title                  <chr> "Minimizing the Economic Burden of the ...
## $ full_text_xml_url      <chr> "https://www.federalregister.gov/docume...
## $ body_html_url          <chr> "https://www.federalregister.gov/docume...
## $ json_url               <chr> "https://www.federalregister.gov/api/v1...
# Challenge 10 Part 2: Collate the data to answer question
# QUESTION: Did any President get more detailed on their EOs as time in office progressed?
eo_df2 <- eo_df %>% 
  mutate(length_of_order = end_page-start_page)
eo_df3 <- eo_df2 %>% 
    mutate(inauguration_date = ifelse(potus == "Donald Trump", "2017-01-20",
                                     ifelse(potus == "Barack Obama", "2009-01-20",
                                            ifelse(potus == "George W. Bush", "2001-01-20",
                                                   ifelse(potus == "William J. Clinton", "1993-01-20", "1989-01-20")))))
eo_df4 <- eo_df3 %>% 
  mutate(length_in_office = unclass(as.Date(signing_date)-as.Date(inauguration_date)))
# Remove large executive orders as outliers
eo_df5 <- eo_df4 %>% 
  filter(length_of_order < 20)
glimpse(eo_df5)
## Observations: 880
## Variables: 19
## $ potus                  <chr> "Donald Trump", "Donald Trump", "Donald...
## $ eo_link                <chr> "https://www.federalregister.gov/docume...
## $ citation               <chr> "82 FR 8351", "82 FR 8657", "82 FR 8793...
## $ document_number        <chr> "2017-01799", "2017-02029", "2017-02095...
## $ end_page               <int> 8352, 8658, 8797, 8803, 8982, 9338, 934...
## $ executive_order_notes  <chr> NA, "See: EO 13807, August 15, 2017", N...
## $ executive_order_number <int> 13765, 13766, 13767, 13768, 13769, 1377...
## $ html_url               <chr> "https://www.federalregister.gov/docume...
## $ pdf_url                <chr> "https://www.gpo.gov/fdsys/pkg/FR-2017-...
## $ publication_date       <chr> "2017-01-24", "2017-01-30", "2017-01-30...
## $ signing_date           <chr> "2017-01-20", "2017-01-24", "2017-01-25...
## $ start_page             <int> 8351, 8657, 8793, 8799, 8977, 9333, 933...
## $ title                  <chr> "Minimizing the Economic Burden of the ...
## $ full_text_xml_url      <chr> "https://www.federalregister.gov/docume...
## $ body_html_url          <chr> "https://www.federalregister.gov/docume...
## $ json_url               <chr> "https://www.federalregister.gov/api/v1...
## $ length_of_order        <int> 1, 1, 4, 4, 5, 5, 2, 1, 2, 1, 1, 1, 2, ...
## $ inauguration_date      <chr> "2017-01-20", "2017-01-20", "2017-01-20...
## $ length_in_office       <dbl> 0, 4, 5, 5, 7, 8, 10, 14, 20, 20, 20, 2...
# Challenge 10 part 4: Draw greyscale graph
# Black & White plot
# NOTE: Is there any way to avoid facet wrap here?
eo_df_bw <- ggplot(eo_df5, aes(x = length_in_office, 
                   y = length_of_order, 
                   fill = fct_reorder2(potus, length_in_office, length_of_order))) + 
  geom_point(size = 2, shape = 21) +
  labs(x = "Days in Office", 
       y = "Pages in EO", 
       fill = "potus") +
  scale_fill_grey(start = 0.3, end = 1) +
  scale_color_grey(start = 0.3, end = 1) 
eo_df_bw +
  theme_minimal() +
  facet_wrap(~potus)

ggplot(eo_df5, mapping = aes(x = length_in_office, y = length_of_order, color = potus) ) + geom_point()