Data Analysis Workout 08: Data Science Salaries

Level of Difficulty:

Objective: This workout provides practice in exploratory data analysis and relationships between variables.

Download the dataset here: https://buff.ly/3MtO0FC

Challenge Questions:

  1. Which role has the highest salary employment wise?
  2. Which employment types do employers prefer to hire?
  3. Which role are entry leveled generally hired for?
  4. Which countries pay the highest for which roles?
  5. What insights can you find regarding employee demographics?
  6. Which experience level has the highest hiring?
  7. Does company size affect the rate of hiring and pay scale?
  8. What is the year over year (YoY) salary growth at different levels?

Simply post your code and a screenshot of your results.

Please format your code and blur it or place it in a hidden section.

This workout will be released on Monday May 22, 2023, and the author’s solution will be posted on Sunday May 228, 2023.

I’m really getting the hang of using Google Collab in combination with ChatGPT

Loaded the data in

Then work through each question

I’m quite sold on the possibilities and the workflow of the notebook style of analysis. I really like it

1 Like

Great work here @sam.mckay

ChatGPT really cuts down the time it takes to do this kind of work.

import pandas as pd

Load the dataset

data = pd.read_csv(‘ds_salaries.csv’)

Question 1: Which role has the highest salary employment-wise?

highest_salary_role = data.groupby(‘Job Title’)[‘Annual Salary’].mean().idxmax()
print(“Role with the highest salary employment-wise:”, highest_salary_role)

Question 2: Which employment types do employers prefer to hire?

employment_types = data[‘Employment Type’].value_counts()
print(“Employment types preferred by employers:”)
print(employment_types)

Question 3: Which role are entry-level positions generally hired for?

entry_level_role = data[data[‘Experience Level’] == ‘Entry Level’][‘Job Title’].value_counts().idxmax()
print(“Role generally hired for entry-level positions:”, entry_level_role)

Question 4: Which countries pay the highest for which roles?

highest_salary_countries = data.groupby([‘Job Title’, ‘Country’])[‘Annual Salary’].mean().idxmax()
highest_salary_role_country = highest_salary_countries[0]
highest_salary_country = highest_salary_countries[1]
print(“Role:”, highest_salary_role_country)
print(“Country paying the highest salary for this role:”, highest_salary_country)

Question 5: Insights regarding employee demographics

employee_demographics = data.groupby([‘Gender’, ‘Ethnicity’]).size().reset_index(name=‘Count’)
print(“Employee demographics:”)
print(employee_demographics)

Question 6: Which experience level has the highest hiring?

highest_hiring_experience_level = data[‘Experience Level’].value_counts().idxmax()
print(“Experience level with the highest hiring:”, highest_hiring_experience_level)

Question 7: Does company size affect the rate of hiring and pay scale?

company_size_hiring_pay = data.groupby(‘Company Size’)[‘Annual Salary’].mean()
print(“Average salary based on company size:”)
print(company_size_hiring_pay)

Question 8: Year over year (YoY) salary growth at different levels

yearly_salary_growth = data.groupby(‘Experience Level’)[‘Annual Salary’].mean().pct_change()
print(“Year over year (YoY) salary growth at different levels:”)
print(yearly_salary_growth)