Data Analysis Workout 01: Chipotle Sales Analysis

Difficulty Level:

Objective: Use the pandas library to perform basic data analysis over Chipotle sales data. This workout will provide you practice in introductory exploratory data analysis to understand your data.

Link to dataset: https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv

Read in .tsv file as .csv:

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, sep = '\t')

Challenge Questions

  1. Which was the most-ordered item?
  2. For the most-ordered item, how many items were ordered?
  3. What was the most ordered item in the choice_description column?
  4. How many items were ordered in total?
  5. Convert the item price into a float.
  6. How much was the revenue for the period in the dataset?
  7. How many orders were made in the period?
  8. What is the average revenue amount per order?
  9. How many different items are sold?

Submission

SImply post your code and a screenshot of your results.

Please format your Python code and blur it or place it in a hidden section.

This workout will be released on Monday March 27, 2023, and the author’s solution will be posted on Monday April 3, 2023.

1 Like
3 Likes

Here is my solution.

Thaks for the workout :slight_smile:

3 Likes

@Ondrej and @JordanSchnurman ,

Nicely done! Thanks for participating.

  • Brian
1 Like

Great job @Ondrej and @JordanSchnurman on the print statements. Good presentation of the answers

1 Like

Here are the solutions:

1. Which was the most-ordered item?

c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)

2. For the most-ordered item, how many items were ordered?

c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)

3. What was the most ordered item in the choice_description column?

c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)

4. How many items were ordered in total?

c = chipo.groupby('choice_description').sum()
c = c.sort_values(['quantity'], ascending=False)

5. Convert the item price into a float.

dollarizer = lambda x: float(x[1:-1])
chipo.item_price = chipo.item_price.apply(dollarizer)

6. How much was the revenue for the period in the dataset?

revenue = (chipo['quantity']* chipo['item_price']).sum()

print('Revenue was: $' + str(np.round(revenue,2)))

7. How many orders were made in the period?

orders = chipo.order_id.value_counts().count()

8.What is the average revenue amount per order?

chipo['revenue'] = chipo['quantity'] * chipo['item_price']
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']

9. How many different items are sold?

chipo.item_name.value_counts().count()

Here’s my R code solution to this data analysis workout. Didn’t have the time to make it pretty, but in the future plan to use these workouts to improve my skills on R Markdown and Quarto.

Click for R Code and Output

library(tidyverse)
> 
> chipotle <- read_delim("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv",
+                        delim = "\t", escape_double = FALSE,
+                        trim_ws = TRUE)
Rows: 4622 Columns: 5                                                                                                                            
Delimiter: "\t"
chr (3): item_name, choice_description, item_price
dbl (2): order_id, quantity

> 
> head(chipotle, 5)
# A tibble: 5 × 5
  order_id quantity item_name                             choice_description                                                         item_price
     <dbl>    <dbl> <chr>                                 <chr>                                                                      <chr>     
1        1        1 Chips and Fresh Tomato Salsa          NULL                                                                       $2.39     
2        1        1 Izze                                  [Clementine]                                                               $3.39     
3        1        1 Nantucket Nectar                      [Apple]                                                                    $3.39     
4        1        1 Chips and Tomatillo-Green Chili Salsa NULL                                                                       $2.39     
5        2        2 Chicken Bowl                          [Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]] $16.98    
> 
> #Q1 Which was the most-ordered item?
> #Q2 For the most-ordered item, how many items were ordered?
> 
> orders <- chipotle %>% group_by(item_name) %>% 
+   summarise(TotalQuantity = sum(quantity)) %>% 
+   arrange(desc(TotalQuantity))
>           
> head(orders, 2)
# A tibble: 2 × 2
  item_name       TotalQuantity
  <chr>                   <dbl>
1 Chicken Bowl              761
2 Chicken Burrito           591
> 
> 
> #Q3 What was the most ordered item in the choice_description column?
> 
> descriptions <- chipotle %>% 
+   filter( choice_description != "NULL" ) %>% 
+   group_by(choice_description) %>% 
+   summarise(TotalDescQuantity = sum(quantity)) %>% 
+   arrange(desc(TotalDescQuantity))
> 
> head(descriptions, 2)
# A tibble: 2 × 2
  choice_description TotalDescQuantity
  <chr>                          <dbl>
1 [Diet Coke]                      159
2 [Coke]                           143
> 
> #Q4 How many items were ordered in total?
> 
> TotalItemsOrdered <- sum(chipotle$quantity)
> TotalItemsOrdered
[1] 4972
> 
> #Q5 Convert the item price into a float.
> 
> chipotle$price <- gsub("\\$", "", chipotle$item_price) %>% 
+  as.numeric(.)
> 
> head(price, 5)
[1]  2.39  3.39  3.39  2.39 16.98
> 
> #Q6 How much was the revenue for the period in the dataset?
> 
> Revenue <- chipotle$quantity * chipotle$price 
> TotalRevenue <- sum(Revenue)
> TotalRevenue
[1] 39237.02
> 
> 
> #Q7 How many orders were made in the period?
> TotalOrders <- n_distinct(chipotle$order_id) 
> TotalOrders
[1] 1834
> 
> 
> #Q8 What is the average revenue amount per order?
> 
> TotalRevenue / TotalOrders
[1] 21.39423
> 
> #Q9 How many different items are sold?
> 
> n_distinct(chipotle$item_name)
[1] 50
>
1 Like

Great job @BrianJ . Good to see submissions in different tools

1 Like

Loved working through this. I personally can’t write the code but with AI co-pilots helping me out I can get there and do so much more that I would have dreamt of 3-4 months ago



What I just love as well is you ask AI what other insights would be interesting from this data and it gives you so much inspiration

——————————————-

Certainly! The Chipotle dataset provides various opportunities for advanced analytics and insights. Here are a few examples to explore:

  1. Customer Analysis:

    • Identify the top customers who placed the most orders or spent the most money.
    • Analyze customer preferences and patterns by examining their order history.
    • Segment customers based on their ordering behavior or demographics.
    • Determine customer retention and loyalty metrics.
  2. Menu Item Analysis:

    • Identify the most popular menu items by analyzing the quantity ordered or revenue generated.
    • Analyze item relationships (e.g., which items are commonly ordered together) using association rules or market basket analysis.
    • Perform sentiment analysis on customer reviews or comments related to specific menu items.
  3. Pricing Analysis:

    • Analyze pricing strategies and their impact on sales or customer behavior.
    • Investigate the relationship between item prices and customer satisfaction or order frequency.
    • Conduct price elasticity analysis to understand how price changes affect demand.
  4. Time-based Analysis:

    • Analyze sales patterns by time of day, day of the week, or month of the year.
    • Identify peak hours or busy periods for order preparation and resource allocation.
    • Detect any seasonality or trends in customer behavior.
  5. Geographic Analysis:

    • Explore the distribution of orders across different locations or regions.
    • Analyze regional preferences or variations in menu item popularity.
    • Perform spatial analysis to identify areas with high customer density or potential for new store locations.
  6. Recommendation System:

    • Build a recommendation system to suggest additional menu items based on a customer’s current order or historical preferences.
    • Utilize collaborative filtering or content-based filtering techniques to generate personalized recommendations.
  7. Forecasting and Inventory Optimization:

    • Forecast future demand for menu items to optimize inventory management and minimize waste.
    • Apply time series forecasting techniques to predict sales or order quantities.
    • Optimize supply chain logistics based on demand forecasts and lead times.

These are just a few examples to get you started. You can choose the areas that align with your interests or business objectives and explore them in more detail using data analysis and machine learning techniques.

Remember to leverage the power of Python libraries such as pandas, NumPy, matplotlib, scikit-learn, or statsmodels for data manipulation, visualization, statistical analysis, and machine learning.

Feel free to dive into any of these areas and let me know if you have any specific questions or need further guidance along the way!

I just kept going

I even placed this data into gpt4 and got an amazing outline of it for more context

2 Likes

Wow great work @SamMcKay! They have an AI Co pilot in Google Collab?

No they don’t yet, just used chatGPT and EDNA AI

Here goes my solution to this challenge:

1 Like

Well done, nice one for getting involved on this

1 Like

Just curious what is the actual answer to question #3
What was the most ordered item in the choice_description column?

I see answers all over the place on this one.

Hi @mbraun

the answers are posted by @kedeisha1 (posting message #5)

I hope this helps.
Keith

@Keith These are the python scripts to get the answer but as you can see the second and third are the same.

@mbraun@kedeisha1 would have to answer that one as she is the on that has the put up the solution.

thanks
Keith

@keith @kedeisha1 - The reason I ask some people took the Choice_Description column and then summed the individual occurrence. I thought it was dissect the column delimited and if the quantity was 2 then it was 2 of that item per order. Then summed it up if that makes sense.