Beginner question on data transformation

Dear community,

I have questions concerning data transformation.

Attached is a raw xlsx-file containing public data from the German Federal Statistical Office on student numbers.

It contains data per university (rows) on number of students specified by nationality (three top columns read “Germans” “Foreigners” and “Total”) and sex (“male” “female” “total” for each top column). It starts with data for the winter semester 1998/99 and continues on for each winter semester.

Here is a image snippet to get a first impression of the wide table.

My goal is to prepare dashboards that visualize the data of specific subsets of universities and compare these subsets in development across time.

My question is:

  • is it good practice to initially transform this originally wide table in a long table?
  • would you do this transformation all from within Power Query Editor?
  • what would be your first steps to transform the data for best use?

Here is the link to the complete table:
student numbers over years

Thank you!

hey @aleph.niemeier

so, here are the answers to your questions

  1. Always and Always Transform your wide Fact-tables to a long table where ever possible. - and has to be done before you start your project

  2. Where ever possible try to source the data in your desired format.

  3. I have made the required transformations for you. - you can open the powerquery in the attached file to see teh required steps.

Have you had a chance to complete some PowerQuery courses from EDNA ? if not then please do. it will really help you.

I hope this helps.

Regards,
edna_sol_japjeet.pbix (659.2 KB)

Japjeet S Sodhi

2 Likes

Hi Japjeet,
wow, thanks! This is very helpful - it really motivates me to really dive into Power Query!
Could you take just another small look?
The original table contains time data. Starting with wintersemester 1998/99 (WS 1998/99) each university is repeatedly listed for following wintersemesters. Some unis don’t add until later semesters as they are founded more recently.
I’m racking my brain on how it’s going to work that the dimension “semester” doesn’t get lost, so that I can analyse changes in numbers over time, and how I should format “1998/99” in order to be useful.

Hey @aleph.niemeier

Again, there are severalways to that depending on if new data will be on different file or existing file etc…

for simplicity, in your scenario i can recommend that as you get more data, keep the existing data and just paste new rows at the end in the same excel file and refresh the report.

that should work for you.

Again once you do some reading / learning in Power query, most of these questions will be resolved.

hope this helps.

Regards
Japjeet

1 Like