Introduction

In today’s lab you will analyze data from over 1,000 different coffees to explore the relationship between a coffee’s aroma and it’s overall quality. You will also begin working with your team and practicing a collaborative data analysis workflow.

Learning goals

By the end of the lab you will…

create plots and calculate associated statistics to assess model diagnostics.
practice collaborating with others using a single Github repo.

Meet your team!

Click here to see the team assignments for STA 210. This will be your team for labs and the final project.

This will be your team for labs and the final project.

Before you get started on the lab, your TA will walk you through the following:

✅ Icebreaker activity to get to know your teammates.

✅ Come up with a team name. You can’t use the same name as another team, so I encourage you to be creative! Your TA will get your team name by the end of lab.

✅ Fill out the team agreement. This will help you figure out a plan for communication,and working together during labs and outside of lab times. You can find the team agreement in the GitHub repo team-agreement-[github_team_name].

Have one person from the team clone the repo and start a new RStudio project. This person will type the team’s responses as you discuss the sections of the agreement. No one else in the team should type at this point but should be contributing to the discussion.
Be sure to push the completed agreement to GitHub. Each team member can refer to the document in this repo or download the PDF of the agreement for future reference. You do not need to submit the agreement on Gradescope.

Getting started

A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to the sta210-fa21 course organization on GitHub.
You should see a repo with the lab-03 prefix.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.

Workflow: Using git and GitHub as a team

Assign each person on your team a number 1 through 4. For teams of three, Team Member 1 can take on the role of Team Member 4.

The following exercises must be done in order. Only one person should type in the .Rmd file and push updates at a time. When it is not your turn to type, you should still share ideas and contribute to the team’s discussion.

Update YAML

Team Member 1: Change the author to your team name and include each team member’s name in the author field of the YAML in the following format. Team Name: Member 1, Member 2, Member 3, Member 4. Knit, commit, and push the changes to GitHub.

Team Members 2, 3, 4: Click the Pull** button in the Git pane to get the updated document. You should see the updated name in the .Rmd file.**

Packages

The follow packages are used in the lab.

library(tidyverse)
library(broom)
library(knitr)
library(ggfortify)

The Data

The dataset for this lab comes from the Coffee Quality Database and was obtained from the #TidyTuesday GitHub repo. It includes information about the origin, producer, measures of various characteristics, and the quality measure for over 1000 coffees.

This lab will focus on the following variables:

aroma: Aroma grade, 0 - 10 scale
total_cup_points: Measure of quality, 0 - 100 scale

You can find the definitions for all variables in the data set here. Click here for more details about how these measures are obtained.

coffee <- read_csv("data/coffee-ratings.csv")

Exercises

Note: Include axis labels and an informative title for all plots. Use the kable function to neatly print tables and regression output. Write all interpretations in the context of the data.

Do the following exercises in order, following each step carefully.

Only one person at a time should type in the .Rmd file and push updates.

If you are working on any portion of the lab virtually, the person working should share their screen and the others should follow along.

Type the team’s response to Exercises 1 - 2.

Visualize the relationship between aroma and the total cup points. What do you observe from the plot? Use the plot the describe the relationship between the two variables.
Fit the linear model and neatly display the results using 3 digits.
- Interpret the slope in the context of the data.

🧶 ✅ ⬆️ Team member 1: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 1- 2.

Team Member 2: It’s your turn! Type the team’s response to exercises 3 - 4.

Would the members of your group drink a coffee represented by the intercept? Why or why not? Discuss as a group and write the group’s consensus.
We will proceed assuming the model conditions hold, so let’s focus on the model diagnostics. We’ll start by examining if there are any points with high leverage in the data.
- What threshold will you use to determine if there are points with high leverage?
- Are there any observations with high leverage? If so, how many? Briefly explain, including any output, graphs, etc. you used to determine the response.

🧶 ✅ ⬆️ Team member 2: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 3 - 4.

Team Member 3: It’s your turn! Type the team’s response to exercise 5.

Next, let’s examine if there are any points with standardized residuals that have large magnitude. Are there any such points in the data? If so, how many? Briefly explain, including any output, graphs, etc. you used to determine the response.

🧶 ✅ ⬆️ Team member 3: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercise 5.

Team Member 4: It’s your turn! Type the team’s response to exercise 6.

Lastly, let’s analyze Cook’s D to determine if there are influential points in the data.
- Based on Cook’s D, are there any influential points in our data? Briefly explain, including any output, graphs, etc. you used to determine the response.
- If there are influential points, briefly explain why they are outliers, ie. not in the trend of the rest of the data.
- If there are influential points, remove those points from the data and refit the model. How do the model coefficients change, if at all?
- If there are influential points, would you recommend using the model fit with or without these points for inferential conclusions and predictions? Briefly explain why or why not. Additionally, briefly explain potential impacts your choice has on inferential conclusions and/or predictions.

🧶 ✅ ⬆️ Team member 4: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the team’s completed lab!

Wrapping up

Team Member 2: Make any edits as needed. Then knit, commit, and push the updated documents to GitHub if you made any changes.

All other team members can click to pull the finalized document.

Submission

Select one team member to upload the team’s PDF submission to Gradescope.
Be sure to include every team member’s name in the Gradescope submission
Associate the “Workflow & formatting” graded section with the first page of your PDF, and mark the page associated with each exercise. If any answer spans multiple pages, then mark all pages.

There should only be one submission per team on Gradescope.

Grading (50 pts)

Component	Points
Ex 1 - 6	42
Workflow & formatting	5
Complete team contract	3

Grading notes:

The “Workflow & formatting” grade is to assess the reproducible workflow and team work. This includes having at least one meaningful commit by each team member and updating the name and date in the YAML.

Lab 03: Coffee ratings

due Mon, September 20 at 11:59p ET