In today’s lab, you’ll use simple linear regression to analyze the relationship between the admissions rate and total cost for colleges and universities in the United States.
By the end of the lab you will…
Go to the sta210-fa21 organization on GitHub. Click on the repo with the prefix lab-02. It contains the starter documents you need to complete the lab.
Clone the repo and start a new project in RStudio. See the Lab 01 instructions for details on cloning a repo, starting a new R project and configuring git.
The follow packages are used in the lab.
library(tidyverse)
library(broom)
library(knitr)
The data for this lab is from the scorecard
data set in the rcfss R package. It includes information originally obtained from the U.S. Department of Education’s College Scorecard for 1753 colleges and universities during the 2018 - 2019 academic year.
The lab focuses on the following variables:
admrate
: Undergraduate admissions rate (from 0-100%)cost
: The average annual total cost of attendance, including tuition and fees, books and supplies, and living expensestype
: Type of college (Public; Private, nonprofit; Private, for-profit)Click here to see a full list of variables and definitions.
Use the code below to load the data set.
<- read_csv("data/scorecard.csv") scorecard
Note: Include axis labels and an informative title for all plots. Use the kable
function to neatly print tables and regression output.
Create a histogram to examine the distribution of admrate
and calculate summary statistics for the center (mean and median) and the spread (standard deviation and IQR).
Use the results from the previous exercise to describe the distribution of admrate
. Include the shape, center, spread, and if there are potential outliers.
Plot the distribution of cost
and calculate the appropriate summary statistics. Describe the distribution of cost
(shape, center, and spread, and outliers) using the plot and appropriate summary statistics.
The goal of this analysis is to fit a regression model that can be used to understand the variability in the cost of college based on the admission rate. Before fitting the model, let’s look at the relationship between the two variables. Create a scatterplot to display the relationship between cost and admissions rate. Describe the relationship between the two variables based on the plot.
Does the relationship between cost and admissions rate differ by type of college? Modify the plot from the previous exercise visualize the relationship by type of college.
Describe two new observations from the scatterplot in Exercise 5 that you didn’t see in the scatterplot from Exercise 4.
Fit the linear regression model. Display the confidence interval for the coefficients in the output. Use the kable
function to neatly display the results.
Consider the model from the previous exercise.
Does the data provide evidence of a statistically significant linear relationship between cost and admissions rate? Conduct a hypothesis test to answer this question. In your response
Interpret the 95% confidence interval for the slope in context. Then indicate whether or not it is consistent with the results of the hypothesis test from the previous exercise. Briefly explain your response.
Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes.
Remember – you must turn in a PDF file to the Gradescope page before the submission deadline for full credit.
To submit your assignment:
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
Click on your STA 210 course.
Click on the assignment, and you’ll be prompted to submit it.
Mark the pages associated with each exercise. All of the pages of your lab should be associated with at least one question (i.e., should be “checked”).
Select the first page of your .PDF submission to be associated with the “Workflow & formatting” section.
Component | Points |
---|---|
Ex 1 - 10 | 45 |
Workflow & formatting | 5 |
Grading notes: