By the end of the lab you will be able to…
The data from this lab comes from the the article FiveThirtyEight The Ultimate Halloween Candy Power Ranking by Walt Hickey. To collect data, Hickey and collaborators at FiveThirtyEight set up an experiment people could vote on a series of randomly generated candy matchups (e.g. Reeses vs. Skittles). Click here to check out some of the match ups.
The data set contains the characteristics and win percentage from 85 candies in the experiment. The variables are
Variable | Description |
---|---|
chocolate |
Does it contain chocolate? |
fruity |
Is it fruit flavored? |
caramel |
Is there caramel in the candy? |
peanutalmondy |
Does it contain peanuts, peanut butter or almonds? |
nougat |
Does it contain nougat? |
crispedricewafer |
Does it contain crisped rice, wafers, or a cookie component? |
hard |
Is it a hard candy? |
bar |
Is it a candy bar? |
pluribus |
Is it one of many candies in a bag or box? |
sugarpercent |
The percentile of sugar it falls under within the data set. Values 0 - 1. |
pricepercent |
The unit price percentile compared to the rest of the set. Values 0 - 1. |
winpercent |
The overall win percentage according to 269,000 matchups. Values 0 - 100. |
Use the code below to load the data from the candy_rankings
data frame in the fivethirtyeight R package.
<- fivethirtyeight::candy_rankings candy
The goal of this analysis is to use linear regression to determine what makes the best candy. We’ll define “best” as the candy that can win the highest percentage of match ups.
Before fitting our model, let’s take a look at the model used by author Walt Hickey in the FiveThirtyEight article. He fits a model using nine candy characteristics. The output can be found within the text of the article.
Now it’s your turn to build a model. For the model selection, consider all relevant variables in the data set as potential predictors, regardless of whether they’re in the model in the FiveThirtyEight article.
Use backward model selection with AIC as the selection criteria to choose a candidate model. Add include = FALSE
in the header of the code chunk with the model selection code, so the step-by-step output does not print in the knitted PDF.
Next, use forward model selection with BIC as the selection criteria to choose a candidate model. Add include = FALSE
in the header of the code chunk with the model selection code, so the step-by-step output does not print in the knitted PDF.
There are some variables selected by the model selection procedure in Exercise 2 that were not included in the selection procedure in Exercise 3. Use a Nested F test to determine if there is evidence that at least one of the additional variables selected in Exercise 2 are useful predictors of win percentage. Use \(\alpha = 0.05\).
Let’s use model summary statistics to choose the model that is the best fit for the data - either the model selected in Exercise 2 or the model selected in Exercise 3. Briefly explain your choice using appropriate model summary statistic, \(R^2\) or Adjusted \(R^2\), to support your response.
Use the model chosen in the previous exercise:
Plot the relationship between the sugar percentile and win percentage with the points colored based on whether the candy has crisped rice, wafers or cookie. Include lines on the plot to more clearly see the relationship between sugar percentile and win percentage based on whether the candy has crisped rice, wafers or cookie.
Add the interaction between sugarpercent
and crispedricewafer
to the model selected in Exercise 5. Neatly display the updated model using 3 digits.
Is there evidence that the effect of sugar percentile differs based on whether candy has crisped rice, wafers or cookie? Briefly explain, including the results used to make the determination.
Use the model to describe what generally makes a good candy, i.e. one with a high win percentage.
There should only be one submission per team on Gradescope.
Component | Points |
---|---|
Ex 1 - 10 | 45 |
Workflow & formatting | 5 |
Grading notes:
There should only be one submission per team on Gradescope.