Lab 06: Logistic regression

due Monday, November 08 at 11:59pm

Learning goals

By the end of the lab you will be able to…

Data: General Social Survey

The General Social Survey (GSS) has been used to measure trends in attitudes and behaviors in American society since 1972. In addition to collecting demographic information, the survey includes questions used to gauge attitudes about government spending priorities, confidence in institutions, lifestyle, and many other topics. A full description of the survey may be found here.

The data for this lab are from the 2016 General Social Survey. The original data set contains 2867 observations and 935 variables. We will use and abbreviated data set that includes the following variables:

The data are in gss2016.csv in the data folder.

Exercises

The goal of today’s lab is to use the GSS to examine the relationship between US adults’ political views and attitudes towards government spending on mass transportation projects.

Part I: Exploratory data analysis

  1. Let’s begin by making a binary variable for respondents’ views on spending on mass transportation. Create a new variable that is equal to “1” if a respondent said spending on mass transportation is about right and “0” otherwise. Then make a plot of the new variable, using informative labels for each category.

  2. Recode polviews so it is a factor with levels that are in an order that is consistent with question on the survey. Note how the categories are spelled in the data.

    • Make a plot of the distribution of polviews.
    • Which political view occurs most frequently in this data set?
  3. Make a plot displaying the relationship between satisfaction with mass transportation spending and political views. Use the plot to describe the relationship the two variables.

  4. We’d like to use age as a quantitative variable in your model; however, it is currently a character data type because some observations are coded as "89 or older".

    • Recode age so that is a numeric variable. Note: Before making the variable numeric, you will need to replace the values "89 or older" with a single value.
    • Then plot the distribution of age.

Part II: Logistic regression model

  1. Briefly explain why we should use a logistic regression model to predict the odds a randomly selected person is satisfied with spending on mass transportation.

  2. Let’s start by fitting a model using the demographic factors - age, sex, sei10, and region - to predict the odds a person is satisfied with spending on mass transportation. Make any necessary adjustments to the variables so the intercept will have a meaningful interpretation. Neatly display the model.

  3. Interpret the intercept in the context of the data.

  4. Consider the relationship between age and one’s opinion about spending on mass transportation. Interpret the coefficient of age in terms of the odds of being satisfied with spending on mass transportation.

  5. Now let’s see whether a person’s political views has a significant impact on their odds of being satisfied with spending on mass transportation, after accounting for the demographic factors.

    Conduct a drop-in-deviance test to determine if polviews is a significant predictor of attitude towards spending on mass transportation. State the null and alternative hypotheses in words, display all relevant code and output, and state your conclusion in the context of the problem.

  6. Use the model selected in the previous exercise to describe the relationship (if any) between one’s political views and their attitude towards spending on mass transportation. Be sure your answer includes the interpretation of the model coefficients and associated hypothesis tests or confidence intervals used to support your response.

Submission

There should only be one submission per team on Gradescope.

Grading (50 pts)

Component Points
Ex 1 - 10 40
Model diagnostics activity (individual score) 5
Workflow & formatting 5

Grading notes:

There should only be one submission per team on Gradescope.