Data Analysis Workout 04: Basic Statistics

kedeisha1 · April 18, 2023, 4:00pm

Level of difficulty:

Objective: This workout provides some basic practice in summarizing your data using basic statistics.

Link to dataset: https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv

Challenge Questions

Which continent drinks more beer on average?
For each continent print the statistics for wine consumption.
Print the mean alcohol consumption per continent for every column.
Print the median alcohol consumption per continent for every column.

Simply post your code and a screenshot of your results.

Please format your Python code and blur it or place it in a hidden section.

This workout will be released on Monday April 18, 2023, and the author’s solution will be posted on Sunday April 23, 2023.

BrianJ · April 18, 2023, 10:35pm

@kedeisha1 ,

Pretty straightforward workout, but provided a great opportunity to do a deep dive into Quarto, the new(ish) cross platform Markdown tool for formatting R, Python, Julia code and output. It’s fantastic!

Question 1

Question 2

Question 3

Question 4

Brian

TomiwaB · April 20, 2023, 3:12am

@kedeisha1

This is my first workout, and I am happy to mention that it’s been a good learning experience. Please see below my answers.

Answer to Question 1

Ans_1

Answer to Question 2

Ans_2

Answer to Question 3

Ans_3

Answer to Question 4

Ans_4

Ondrej · April 20, 2023, 1:08pm

Hi,

first time ever using R. Couldn’t figured out how to get quartiles so I got inspiration from @BrianJ previous workout. So thank you @BrianJ

R Code

url <- "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv"
library(tidyverse)
df <- read.csv(url, na.strings = "")
head(df)

## 1. Which continent drinks more beer on average?
df %>% 
  group_by(continent) %>% 
  summarise(mean = mean(beer_servings))

options(width = 200)
df %>% 
  group_by(continent) %>% 
    summarise(
      across(wine_servings, 
         list(
           mean = mean,
           std = sd,
           min = min,
           Q1 = ~quantile(., 0.25),
           Q2 = median,
           Q3 = ~quantile(., 0.75),
           max = max))
    )

df %>% 
  group_by(continent) %>% 
    summarise(across(2:5, mean))

df %>% 
  group_by(continent) %>% 
    summarise(across(2:5, median))

Results

Ondrej · April 20, 2023, 1:12pm

@BrianJ This is looking awesome. Looking forward to try it out next workout! Thanks for the tip.

JordanSchnurman · April 21, 2023, 2:16am

Data Analysis Challenge 4 Statistics.docx (254.4 KB)

BrianJ · April 21, 2023, 3:25am

@Ondrej ,

I’d been wanted to really dive into Quarto for a while, and I was on leave this week and had some extra time so took the opportunity to do some learning. If you’re familiar with basic markdown concepts, it’s pretty straightforward.

Here’s a very good 30 minute video that goes through all the basics:

It’s a really well-designed program (already fully incorporated into R Studio, so you won’t need to download anything additional). I haven’t seen anyone who’s tried it who hasn’t raved about it - I’m definitely a big fan already…

Eager to hear what you think.

BTW - very impressive that you just picked R up for the first time and banged out this workout. Keep up the great work, and thanks so much for all your participation and support of the Workouts!

Brian

Ondrej · April 21, 2023, 6:56am

@BrianJ ,

Thanks I will check out the video.

Yeah I’ve watched R For Power BI Users from @gjmount and used https://dplyr.tidyverse.org/ documentation.

But if it wasn’t for the workouts I would probably never try working with R. So it was great idea to start with them.

kedeisha1 · April 22, 2023, 2:27pm

Great work @BrianJ , @Ondrej, @TomiwaB, @JordanSchnurman .

I’ll be learning R again next month, and might start creating my solutions in both python and R

kedeisha1 · April 24, 2023, 2:22pm

Solutions in Python

Which continent drinks more beer on average?


drinks.groupby('continent').beer_servings.mean()

For each continent print the statistics for wine consumption.

drinks.groupby('continent').wine_servings.describe()

Print the mean alcohol consumption per continent for every column.

drinks.groupby('continent').mean()

Print the median alcohol consumption per continent for every column.

drinks.groupby('continent').median()

SamMcKay · April 28, 2023, 10:12am

Definitely want to try this one next. Love the look of Quarto @BrianJ

BrianJ · April 28, 2023, 3:51pm

Thanks, @Sam! One of the great things about the work that Posit is doing is they’re making all their tools now seamlessly cross-platform. Thus, this works equally well for R, Python and Julia.