BRIEF
- I have added all relevant R code (after TA fixed the one issue where changing from geom_col to geom_line resulted in disjointed graph) to make the graph fully reproducible. The process in which the data was wrangled and plotted is also fully comprehensible now.
- This is a LINE graph.
- The DATA was cleaned up by the instructors and given to us as a CSV file titled hot_dog_contest_with_affiliation.csv. It contains 5 attributes of 37 years worth of championship information from Nathan’s Hot Dog Eating Competition. It has qualitative data such as champion’s name, gender, and affiliation status. It also had quantitative data for championship year and number of hotdog buns eaten by champion that year. Gender and affiliation columns were converted into factors. Also, based on EDA, there were 11 champions that were affiliated with IOFOCE, 6 that were formerly affiliated and 20 that were unaffiliated.
- The audience is general in nature, with no specializations, who may be interested in fun statistics about Nathan’s hotdog championship over the years.
- This is a graph that depicts the affiliation with IFOCE of male champions of Nathan’s hot-dog eating competition from 1981 to 2017.
- The line graph uses color to quickly show that champions prior to 2005 were unaffiliated but champions have been either former or current members of IFOCE ever since.
- A negative aspect of this line graph is that one cannot tell the transition years from one category to another. So a tooltip over the line indicating the year, or even vertical grid with xticks for each year would have resolved that issue.
- A common variation of this line graph is line graph with tooltip which is used to tell where transitions occured amongst categories. An alternative would be a bar graph.
- The graph required me to first filter the data for male champions for years after 1980. I then created a new column designating a boolean indication of post-IFOCE affilliation or not. I then used geom_line’s grouping functionality in ggplot to draw the line graph above.
# Affiliation data
hot_dogs2 <- read_csv(here::here("data", "hot_dog_contest_with_affiliation.csv"),
col_types = cols(
gender = col_factor(levels = NULL),
affiliated = col_factor(levels=NULL)
))
hot_dogs3 <- hot_dogs2 %>%
filter(year >= 1981, gender == "male") %>%
mutate(post_ifoce = year >= 1997)
hot_dogs3
## # A tibble: 37 x 6
## year gender name num_eaten affiliated post_ifoce
## <int> <fct> <chr> <dbl> <fct> <lgl>
## 1 2017 male Joey Chestnut 72 current TRUE
## 2 2016 male Joey Chestnut 70 current TRUE
## 3 2015 male Matthew Stonie 62 current TRUE
## 4 2014 male Joey Chestnut 61 current TRUE
## 5 2013 male Joey Chestnut 69 current TRUE
## 6 2012 male Joey Chestnut 68 current TRUE
## 7 2011 male Joey Chestnut 62 current TRUE
## 8 2010 male Joey Chestnut 54 current TRUE
## 9 2009 male Joey Chestnut 68 current TRUE
## 10 2008 male Joey Chestnut 59 current TRUE
## # ... with 27 more rows
# Exploratory Data Analysis (EDA)
hot_dogs3 %>%
dplyr::distinct(affiliated)
## # A tibble: 3 x 1
## affiliated
## <fct>
## 1 current
## 2 former
## 3 not affiliated
hot_dogs3 %>%
dplyr::count(affiliated, sort=TRUE)
## # A tibble: 3 x 2
## affiliated n
## <fct> <int>
## 1 not affiliated 20
## 2 current 11
## 3 former 6
# Changed geom from "col" to "line", yet preserving color
affil_plot2 <- ggplot(hot_dogs3, aes(x = year, y = num_eaten, color=affiliated)) +
geom_line(aes(group=1)) +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017") +
scale_fill_manual(values=c('#E9602B','#2277A0','#CCB683'),
name="IFOCE-affiliation")
affil_plot2