Data Analysis Workout 04: Basic Statistics

Level of difficulty:

Objective: This workout provides some basic practice in summarizing your data using basic statistics.

Link to dataset: https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv

Challenge Questions

  1. Which continent drinks more beer on average?
  2. For each continent print the statistics for wine consumption.
  3. Print the mean alcohol consumption per continent for every column.
  4. Print the median alcohol consumption per continent for every column.

Simply post your code and a screenshot of your results.

Please format your Python code and blur it or place it in a hidden section.

This workout will be released on Monday April 18, 2023, and the author’s solution will be posted on Sunday April 23, 2023.

1 Like

@kedeisha1 ,

Pretty straightforward workout, but provided a great opportunity to do a deep dive into Quarto, the new(ish) cross platform Markdown tool for formatting R, Python, Julia code and output. It’s fantastic!

Question 1

Question 2

Question 3

Question 4

  • Brian
2 Likes

@kedeisha1

This is my first workout, and I am happy to mention that it’s been a good learning experience. Please see below my answers.

Answer to Question 1

Ans_1

Answer to Question 2

Ans_2

Answer to Question 3

Ans_3

Answer to Question 4

Ans_4

3 Likes

Hi,

first time ever using R. Couldn’t figured out how to get quartiles so I got inspiration from @BrianJ previous workout. So thank you @BrianJ :wink:

R Code
url <- "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv"
library(tidyverse)
df <- read.csv(url, na.strings = "")
head(df)
## 1. Which continent drinks more beer on average?
df %>% 
  group_by(continent) %>% 
  summarise(mean = mean(beer_servings))
options(width = 200)
df %>% 
  group_by(continent) %>% 
    summarise(
      across(wine_servings, 
         list(
           mean = mean,
           std = sd,
           min = min,
           Q1 = ~quantile(., 0.25),
           Q2 = median,
           Q3 = ~quantile(., 0.75),
           max = max))
    )
df %>% 
  group_by(continent) %>% 
    summarise(across(2:5, mean))
df %>% 
  group_by(continent) %>% 
    summarise(across(2:5, median))
Results


2 Likes

@BrianJ This is looking awesome. Looking forward to try it out next workout! Thanks for the tip.

1 Like
2 Likes

@Ondrej ,

I’d been wanted to really dive into Quarto for a while, and I was on leave this week and had some extra time so took the opportunity to do some learning. If you’re familiar with basic markdown concepts, it’s pretty straightforward.

Here’s a very good 30 minute video that goes through all the basics:

It’s a really well-designed program (already fully incorporated into R Studio, so you won’t need to download anything additional). I haven’t seen anyone who’s tried it who hasn’t raved about it - I’m definitely a big fan already…

Eager to hear what you think.

BTW - very impressive that you just picked R up for the first time and banged out this workout. Keep up the great work, and thanks so much for all your participation and support of the Workouts!

  • Brian
1 Like

@BrianJ ,

Thanks I will check out the video.

Yeah I’ve watched R For Power BI Users from @gjmount and used https://dplyr.tidyverse.org/ documentation.

But if it wasn’t for the workouts I would probably never try working with R. So it was great idea to start with them.

1 Like

Great work @BrianJ , @Ondrej, @TomiwaB, @JordanSchnurman .

I’ll be learning R again next month, and might start creating my solutions in both python and R

2 Likes

Solutions in Python

  1. Which continent drinks more beer on average?

drinks.groupby('continent').beer_servings.mean()
  1. For each continent print the statistics for wine consumption.
drinks.groupby('continent').wine_servings.describe()
  1. Print the mean alcohol consumption per continent for every column.
drinks.groupby('continent').mean()
  1. Print the median alcohol consumption per continent for every column.
drinks.groupby('continent').median()
1 Like

Definitely want to try this one next. Love the look of Quarto @BrianJ

1 Like

Thanks, @Sam! One of the great things about the work that Posit is doing is they’re making all their tools now seamlessly cross-platform. Thus, this works equally well for R, Python and Julia.