In today’s lab you will analyze data from over 1,000 different coffees to explore the relationship between a coffee’s aroma and it’s overall quality. You will also begin working with your team and practicing a collaborative data analysis workflow.
By the end of the lab you will…
Click here to see the team assignments for STA 210. This will be your team for labs and the final project.
This will be your team for labs and the final project.
Before you get started on the lab, your TA will walk you through the following:
✅ Icebreaker activity to get to know your teammates.
✅ Come up with a team name. You can’t use the same name as another team, so I encourage you to be creative! Your TA will get your team name by the end of lab.
✅ Fill out the team agreement. This will help you figure out a plan for communication,and working together during labs and outside of lab times. You can find the team agreement in the GitHub repo team-agreement-[github_team_name].
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to the sta210-fa21 course organization on GitHub.
You should see a repo with the lab-03 prefix.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
Assign each person on your team a number 1 through 4. For teams of three, Team Member 1 can take on the role of Team Member 4.
The following exercises must be done in order. Only one person should type in the .Rmd file and push updates at a time. When it is not your turn to type, you should still share ideas and contribute to the team’s discussion.
Team Member 1: Change the author to your team name and include each team member’s name in the author
field of the YAML in the following format. Team Name: Member 1, Member 2, Member 3, Member 4
. Knit, commit, and push the changes to GitHub.
Team Members 2, 3, 4: Click the Pull** button in the Git pane to get the updated document. You should see the updated name in the .Rmd file.**
The follow packages are used in the lab.
library(tidyverse)
library(broom)
library(knitr)
library(ggfortify)
The dataset for this lab comes from the Coffee Quality Database and was obtained from the #TidyTuesday GitHub repo. It includes information about the origin, producer, measures of various characteristics, and the quality measure for over 1000 coffees.
This lab will focus on the following variables:
aroma
: Aroma grade, 0 - 10 scaletotal_cup_points
: Measure of quality, 0 - 100 scaleYou can find the definitions for all variables in the data set here. Click here for more details about how these measures are obtained.
<- read_csv("data/coffee-ratings.csv") coffee
Note: Include axis labels and an informative title for all plots. Use the kable
function to neatly print tables and regression output. Write all interpretations in the context of the data.
Do the following exercises in order, following each step carefully.
Only one person at a time should type in the .Rmd
file and push updates.
If you are working on any portion of the lab virtually, the person working should share their screen and the others should follow along.
Type the team’s response to Exercises 1 - 2.
Visualize the relationship between aroma and the total cup points. What do you observe from the plot? Use the plot the describe the relationship between the two variables.
Fit the linear model and neatly display the results using 3 digits.
🧶 ✅ ⬆️ Team member 1: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 1- 2.
Team Member 2: It’s your turn! Type the team’s response to exercises 3 - 4.
Would the members of your group drink a coffee represented by the intercept? Why or why not? Discuss as a group and write the group’s consensus.
We will proceed assuming the model conditions hold, so let’s focus on the model diagnostics. We’ll start by examining if there are any points with high leverage in the data.
🧶 ✅ ⬆️ Team member 2: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 3 - 4.
Team Member 3: It’s your turn! Type the team’s response to exercise 5.
🧶 ✅ ⬆️ Team member 3: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercise 5.
Team Member 4: It’s your turn! Type the team’s response to exercise 6.
Lastly, let’s analyze Cook’s D to determine if there are influential points in the data.
🧶 ✅ ⬆️ Team member 4: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the team’s completed lab!
Team Member 2: Make any edits as needed. Then knit, commit, and push the updated documents to GitHub if you made any changes.
All other team members can click to pull the finalized document.
There should only be one submission per team on Gradescope.
Component | Points |
---|---|
Ex 1 - 6 | 42 |
Workflow & formatting | 5 |
Complete team contract | 3 |
Grading notes: