Power Query Workout 15 - Navigating Data Transformation

Title: Power Query Essentials: Navigating Data Transformation

Description:

Power Query, available in both Excel and Power BI, offers a user-friendly interface for data extraction, transformation, and loading (ETL). Dive into the fundamental features of Power Query in this workout and enhance your data wrangling skills.

Scenario:

Imagine you’re working with a dataset that contains sales records. The data has inconsistencies, such as dates in different formats, missing values, and duplicated rows. How would you leverage Power Query to clean and transform this data for analysis?

Objectives:

By the end of this workout, you should be able to:

  1. Understand the core features and functionalities of Power Query.

  2. Transform data by filtering, sorting, and removing duplicates.

  3. Handle common data inconsistencies using Power Query.

Interactive Task:

Given your understanding of Power Query, answer the following:

  1. If you encounter a column named “SaleDate” with dates in different formats, how would you standardize them to a consistent format, e.g., “MM/DD/YYYY”?

    • Your Approach: ________________________
  2. How would you remove duplicate rows based solely on the “TransactionID” column?

    • Your Approach: ________________________
  3. If a column named “ProductType” has some missing values, how would you replace all missing values with the text “Unknown”?

    • Your Approach: ________________________

Questions:

  1. In Power Query, if you want to split a column into multiple columns based on a delimiter, such as a comma, which option would you use?

    • i) Divide Column

    • ii) Column Distribution

    • iii) Split Column

    • iv) Separate Column

  2. How can you aggregate data in Power Query, for instance, if you want to get the total sales for each product category?

    • i) Group By

    • ii) Summarize

    • iii) Aggregate

    • iv) Combine Rows

Duration: 20 minutes

Difficulty: Intermediate

Period :
This workout will be released on Wednesday, September 20, 2023, and will end on Thursday, October 05, 2023. But you can always come back to any of the workouts and solve them.

Hi @EnterpriseDNA ,

Please find my solution to this workout:

Questions:

  1. In Power Query, if you want to split a column into multiple columns based on a delimiter, such as a comma, which option would you use?
    Answer:
  • iii) Split Column
  1. How can you aggregate data in Power Query, for instance, if you want to get the total sales for each product category?
    Answer:
  • i) Group By

Interactive Tasks:

  1. If you encounter a column named “SaleDate” with dates in different formats, how would you standardize them to a consistent format, e.g., “MM/DD/YYYY”?
  • Your Approach: ________________________
    To standardize the dates in the “SaleDate” column to a consistent format, such as “MM/DD/YYYY”, you would use the following approach:
  1. Select the “SaleDate” column: Click on the “SaleDate” column to select it.

  2. Change the data type: On the Home tab, in the Transform group, click on “Data Type”. From the drop-down list, select “Date”. This will change the data type of the column to Date.

  3. Change the date format: On the Home tab, in the Transform group, click on “Locale”. From the drop-down list, select “English (United States)”. This will change the date format to “MM/DD/YYYY”.

Please note that this approach assumes that all dates in the “SaleDate” column are valid. If there are any invalid dates, you might need to clean them up before standardizing the date format.

  1. How would you remove duplicate rows based solely on the “TransactionID” column?
  • Your Approach: ________________________
    To remove duplicate rows based on the “TransactionID” column, you would use the following approach:
  1. Select the “TransactionID” column: Click on the “TransactionID” column to select it.

  2. Remove duplicates: On the Home tab, in the Reduce Rows group, click on “Remove Duplicates”. This will remove all duplicate rows based on the “TransactionID” column.

Please note that this approach will keep the first occurrence of a duplicate “TransactionID” and remove all subsequent occurrences. If you want to keep a different row in case of duplicates, you might need to sort your data first.

  1. If a column named “ProductType” has some missing values, how would you replace all missing values with the text “Unknown”?
  • Your Approach: ________________________
    To replace all missing values in the “ProductType” column with the text “Unknown”, you would use the following approach:
  1. Select the “ProductType” column: Click on the “ProductType” column to select it.

  2. Replace values: On the Home tab, in the Transform group, click on “Replace Values”. In the “Value To Find” field, leave it blank to indicate missing values. In the “Replace With” field, enter “Unknown”. Click OK.

This will replace all missing values in the “ProductType” column with the text “Unknown”.

Thanks for the workout.
Keith

Hi @EnterpriseDNA ,

Please find my solution to this workout.

Questions:

  1. In Power Query, if you want to split a column into multiple columns based on a delimiter, such as a comma, which option would you use?

Answer: iii) Split Column

  1. How can you aggregate data in Power Query, for instance, if you want to get the total sales for each product category?

Answer : i) Group By

  1. If you encounter a column named “SaleDate” with dates in different formats, how would you standardize them to a consistent format, e.g., “MM/DD/YYYY”?

Your Approach:

  • Select the SalesDate Column and change the data type to Date. And then change the date format to “MM/DD/YYY”
  1. How would you remove duplicate rows based solely on the “TransactionID” column?

Your Approach:

  • Right click on TransactionID column , click on Remove duplicates. It will remove duplicates rows in the column.
  1. If a column named “ProductType” has some missing values, how would you replace all missing values with the text “Unknown”?

Your Approach:

  • Right Click on Product type column, select replace values. in the value to find field leave it blank and in the replace with field enter “Unknown” click ok.