class: center, middle, inverse, title-slide # STA Lab ## 09.09.21 ### --- ## Set up - Today's lab will focus on working on HW 01 - The data include county-level demographic and county-level information and variables to help with creating maps - The demographic and election data are from the Economist and were used in the July 2021 article [" In-person voting really did accelerate covid-19’s spread in America"](https://www-economist-com.proxy.lib.duke.edu/graphic-detail/2021/07/10/in-person-voting-really-did-accelerate-covid-19s-spread-in-america) --- ## Goals - Use regression to describe the relationship between a county's political leanings (measured by % of votes for Trump in 2016) and the percent of votes cast in-person in the 2020 election for counties in North Carolina. - Use maps to conduct exploratory data analysis and check the model conditions - Use a random sample of of US counties to draw conclusions about the relationship between political leanings and percent of votes cast in-person in the 2020 election --- ## Exercise 1 - What summary statistics best represent the center and spread of the distribution? why? --- ## Exercise 2 - the code - **Goal**: We will look at the response variable by county. - `county_map_data`: contains the latitude, longitude for the counties in NC - What does the code below do? ```r election_map_data <- left_join(election_nc, county_map_data) ``` --- ## Exercise 2 - the code [`geom_polygon`](https://ggplot2.tidyverse.org/reference/geom_polygon.html): Function use to plot polygons. The coordinates for each vertex is on a separate row. The vertices for a given `group` are connected and filled in. .small[ ```r ggplot() + geom_polygon(county_map_data, # make a map of NC mapping = aes(x = long, y = lat, group = group), # fill each polygon (county) light gray & # outline in white fill = "lightgray", color = "white") + ``` ] --- ## Exercise 2 - the code [`geom_polygon`](https://ggplot2.tidyverse.org/reference/geom_polygon.html): Function use to plot polygons. The coordinates for each vertex is on a separate row. The vertices for a given `group` are connected and filled in. .small[ ```r ggplot() + geom_polygon(county_map_data, # make a map of NC mapping = aes(x = long, y = lat, group = group), # fill each polygon (county) light gray & # outline in white fill = "lightgray", color = "white") + geom_polygon(election_map_data, mapping = aes(x = long, y = lat, group = group, # fill each polygon (county) based on inperson_pct fill = inperson_pct)) + ``` ] --- ## Exercise 2 - the code .small[ ```r ggplot() + geom_polygon(county_map_data, mapping = aes(x = long, y = lat, group = group), fill = "lightgray", color = "white") + geom_polygon(election_map_data, mapping = aes(x = long, y = lat, group = group, fill = inperson_pct)) + # add labels labs(x = "_____", y = "_____", fill = "_____", title = "_____") + # use viridis color palette scale_fill_viridis() ``` ] Read more about the [viridis color palettes](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). --- ## Exercise 2 - If there is no association between county and percent of in-person voting, what would you expect the color pattern to look like on the map? - Which plot do you prefer to help understand the response variable - the histogram or the map? Why?