BRIEF

  1. I have added all relevant R code (after TA clarified the one issue where regression could not be linear) to make the graph fully reproducible. The process in which the data was wrangled and plotted is also fully comprehensible now.
  2. This is a SCATTER graph.
  3. The DATA was cleaned up by the instructors and given to us via hyperlink here: http://bit.ly/cs631-moma. It contains 23 attributes for 2,253 art works information from MoMA. It has qualitative data such as artist, title, and department. It also had quantitative data for year artwork was created and year when it was acquired by MoMA.
  4. The audience is general in nature, with no specializations, who may be interested in fun statistics about MoMa’s art collections.
  5. This is a graph that depicts how quickly MoMA has moved to acquire an artwork after its creation over time.
  6. The scatter graph uses a non-linear regression curve to show that MoMA’s art acquisition was not very contemporaneous in its early existence and only recently has it become more so.
  7. A negative aspect of this scatter graph is that one cannot tell which point corresponds to which artwork. A tooltip based annotation could resolve that issue.
  8. A common variation of this scatter plot could be another scatter plot whose point size expands or contracts based on how quickly a painting was acquired after creation. An alternative could be a dot plot where acquisition year could be mapped against the age of the painting at acquisition.
  9. For this ggplot, I used a constant-alpha geom_point with a geom_smooth to paint the non-linear regression curve
library(here)
library(readr)
library(dplyr)
library(ggthemes)
library(tidyverse)
library(extrafont)
moma <- read_csv("http://bit.ly/cs631-moma")
glimpse(moma)
## Observations: 2,253
## Variables: 23
## $ title             <chr> "Rope and People, I", "Fire in the Evening",...
## $ artist            <chr> "Joan Miró", "Paul Klee", "Paul Klee", "Pabl...
## $ artist_bio        <chr> "(Spanish, 1893–1983)", "(German, born Switz...
## $ artist_birth_year <int> 1893, 1879, 1879, 1881, 1880, 1879, 1943, 18...
## $ artist_death_year <dbl> 1983, 1940, 1940, 1973, 1946, 1953, 1977, 19...
## $ num_artists       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ n_female_artists  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ n_male_artists    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ artist_gender     <chr> "Male", "Male", "Male", "Male", "Male", "Mal...
## $ year_acquired     <dbl> 1936, 1970, 1966, 1955, 1939, 1968, 1997, 19...
## $ year_created      <int> 1935, 1929, 1927, 1919, 1925, 1919, 1970, 19...
## $ circumference_cm  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ depth_cm          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ diameter_cm       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ height_cm         <dbl> 104.8, 33.8, 60.3, 215.9, 50.8, 129.2, 200.0...
## $ length_cm         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ width_cm          <dbl> 74.6, 33.3, 36.8, 78.7, 54.0, 89.9, 200.0, 3...
## $ seat_height_cm    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ purchase          <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA...
## $ gift              <lgl> TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE,...
## $ exchange          <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FAL...
## $ classification    <chr> "Painting", "Painting", "Painting", "Paintin...
## $ department        <chr> "Painting & Sculpture", "Painting & Sculptur...
# 6.1 How many paintings?
# How many rows/observations are in moma? 2,253
# How many variables are in moma? 23
# 8.1 Plot year painted vs year acquired (Challenge #3)
# NOTE: Unable to get the RED regression line to work
ggplot(moma, aes(x = year_created, y = year_acquired)) + 
  geom_point(na.rm = TRUE,alpha=0.5) +
  geom_smooth(se=FALSE,color="red") + 
  labs(x = "Year painted", y = "Year acquired") +
  ggtitle("MoMA Keeps Its Collection Current")