Data Analysis Workout 02: Soccer Analysis

Level of difficulty:

Objective: This workout will provide you practice in sub setting and filtering your data for analysis.

Note: this workout is entirely platform/tool independent. Feel free to use any tool you would like to perform this workout - SQL, DAX, Power Query, Python, R, Excel, Lotus 1-2-3 (just seeing if you’re paying attention…). Whatever tool you want to hone your skills on is fine - we hope to have a range of different tools utilized in this workout every week.

Link to the dataset: https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/02_Filtering_%26_Sorting/Euro12/Euro_2012_stats_TEAM.csv

Challenge Questions

  1. How many teams participated in the Euro2012?
  2. What is the number of columns in the dataset?
  3. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline.
  4. Sort the teams by Red Cards, then to Yellow Cards.
  5. Calculate the mean Yellow Cards given per Team.
  6. Filter teams that scored more than 6 goals.
  7. Select the teams that start with the letter G.
  8. Select the first 7 columns.
  9. Select all columns except the last 3.
  10. Present only the Shooting Accuracy from England, Italy and Russia.

Submission

Simply post your code and a screenshot of your results.

Please format your code and blur it or place it in a hidden section.

This workout will be released on Tuesday April 4, 2023, and the author’s solution will be posted on Sunday April 9, 2023.

1 Like

Hi,
my try below:

Summary




Thank you @kedeisha1 for the challenges.

1 Like
1 Like

@kedeisha1 ,

Thanks - nice workout.

Decided to do this one in M code. Here’s my solution:

Click for M Code Solution

Q1.

Q2.
image

Q3.

Q4.

Q5.

Q6.

Q7.

Q8.

Q9.

Q10.

1 Like

Hi

My submission :slight_smile:

Thanks

1 Like

Here are my solutions in pandas code:

  1. How many teams participated in the Euro2012?
euro12.shape[0]
  1. What is the number of columns in the dataset?
euro12.shape[1]
  1. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline.
discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
  1. Sort the teams by Red Cards, then to Yellow Cards.
discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending = False)
  1. Calculate the mean Yellow Cards given per Team.
discipline['Yellow Cards'].mean()
  1. Filter teams that scored more than 6 goals.
euro12[euro12.Goals > 6]
  1. Select the teams that start with the letter G.
euro12[euro12.Team.str.startswith('G')]
  1. Select the first 7 columns.
euro12.iloc[: , 0:7]
  1. Select all columns except the last 3.
euro12.iloc[: , :-3]
  1. Present only the Shooting Accuracy from England, Italy and Russia.
euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]
1 Like