Latest Enterprise DNA Initiatives

Problem of the Week #2 (PQ) Create a Dimension Table from a Text file

OK, here we go! Problem of the Week # 2 is live!

For background on this initiative, check out this post and this weeks launch video that Brian and I did on YouTube.

Background on This Week’s Problem

Power Query is a data transformation and data preparation engine. Using Power Query, you can perform the extract, transform, and load (ETL) processing of data.

Frequently, when building dimension tables you can take advantage of existing data on the internet. However, while often you can find exactly the data you need, it may not be optimally structured for usage in Power BI. This week’s problem is taken from a real-world case just last week, when for a mapping report the best data source was available in a text file with a somewhat unusual structure.

Power Query is built for reshaping this type of data into a structure well-suited for a star schema data model within Power BI.

Your Task

This week’s Problem of the Week challenges you to reshape data in the provided txt file into a clean and proper Dimension table that’s suitable for analysis.

Key areas to cover:

  • Turning data into a proper tabular format
  • Trimming, cleaning and properly formatting values

As always, this initiative is all about expanding your knowledge and skills by working through practical examples. The primary focus being the process, this is also why we encourage you to do a write up.

Important we ask that if you post anything directly related to a solution to please use the “Hide Details” or “Blur Spoiler” options within your post. This will prevent people who have not yet completed their solution from seeing things they might not want to see yet and could influence their solution.

To use these options, just select the text or image you want to hide, click on the gear icon on the top right at the top of your post and select either Hide Details or Blur Spoiler.

hide-blur

Want to stretch yourself further?
You can increase the difficulty level by examining and/or even modifying the M code in the Advanced Editor.

Note here’s how to extend your External Tools menu with a link to the M Reference Guide.

Note that this is a Power Query-only challenge, so no other tools or techniques are allowed even if that’s what you might choose to use outside of this challenge.

Eligibility for the Drawing

To encourage participation, we will be doing a drawing of five random participants each round to receive one of these cool Enterprise DNA vinyl laptop stickers:

problem solver

To be eligible for the drawing, just send an email to problemoftheweek@enterprisedna.co before 11:59pm ET, Tuesday, December 22 , 2020 indicating you’ve successfully completed the challenge, and providing your mailing address for us to send the sticker if you’re selected.

We hope you find this initiative fun, challenging and helpful in continuing to build your Power BI skills. All of the information you will need is contained in files within this post. If you have any questions or comments, just message @BrianJ or @Melissa in this forum thread.

Good luck, and enjoy!!!

Melissa

P.S. Your feedback is always welcomed, and if you have a real-world problem that you think might make a good future Problem of the Week, please email that to us at problemoftheweek@enterprisedna.co.

Thanks!
.

To get started:
Download the files below, in the Power Query Editor select “Manage Parameter”
Add the full file path and name to the list and finally set it as “Current Value”.
And have fun…

data.txt (133.7 KB)
eDNA Problem of the Week 2 – Dimension Table from Text PQ Dec 2020.pbix (2.8 MB)

4 Likes

Problem of the Week-ers,

For those of you searching for the YouTube video, note that it hasn’t posted yet but we expect that it will within the next few hours. As soon as it does, I will update Melissa’s post with the link and let you know that it is live.

Good luck on this one – we think it’s a good one in that to solve it you will need some techniques that every PQ user should have in their toolbox. If you’ve mastered those, this one will likely go quickly, and if you haven’t, working through this will definitely help you build those skills.

UPDATE: YouTube video intro to this problem is now live:

  • Brian
3 Likes

Hi Team,

Great POW. It definitely builds major PQ concepts.

I submitted my version of the solution but eagerly looking forward to learning new tricks from others :slight_smile:

3 Likes

Hi @AnshP,

Great to have you here and thanks for participating!

The primary focus in these challenges is the process. So could you to do a brief write up on how you worked through the challenge and what decisions you made during that process?

Please don’t forget to use the “Hide Details” or “Blur Spoiler” options (2nd image in Post #1) when providing a write up / solution details. This will prevent people who haven’t completed the challenge yet from seeing things they might not want to see.

Again kudo’s on successfully completing this POTW challenge in record time. :clap: :clap: :clap:
Hope to see you in the next one.

@Melissa and @BrianJ

Here is my solution

Summary

Problem of the Week #2 – Rajesh.pbix (2.9 MB)

4 Likes

Hi @Rajesh,

Another early submission and it’s a creative one, thanks for participating!!

So I’ll ask you the same, could you to do a brief write up on how you worked through the challenge and what decisions you made during that process?

Please don’t forget to use the “Hide Details” or “Blur Spoiler” options (like you did in your submission)

Kudo’s on successfully completing this POTW challenge. :clap: :clap: :clap:
Hope to see you in the next one.

Hi @Melissa and @BrainJ

Here is my solution…

4 Likes

Hi @MK3010,

Wonderful solution, nice combination of the UI and some M coding.
Thanks so much for joining in on the challenge!!

Feel free to already share a brief write up on how you worked through the challenge and what decisions you made during that process. But don’t forget to use the “Hide Details” or “Blur Spoiler” options (like you did in your submission) :+1:

Hope to see you in the next one.

Here’s a copy of the solution for Problem of the Week. The process I used is to break down each of the required changes step by step and building on your rubber duck suggestion @BrianJ I have my own tortoise to explain what I’m trying to do.
Once I work out what I want to do, I then look at the Power Query pages if I’m not sure of how to create that transformation. I also find that if I am able to verbalise exactly what I want to achieve it makes my google searches so much more effective.

solution|690x391

3 Likes

Hi there,

Cost me some time and it is not the most elegant solution, but I tried to build it without using a single line of M-code. And it worked! (Update: as Melissa points out, I indeed used an M for the custom column - will see if I can change this)
PS: I really like how the flags are used.

Thanks for this great problem.

Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Posted Dec 2020 _ Rens.pbix (2.9 MB)

1 Like

Hi @Melissa,

The process I took was to use the pivot column functionality that required a custom index. Once that was done, it was a matter of cleaning the data, joining the flag column, etc. When I reviewed my code, I realized that many of these transformations were repetitive. Hence, I could have followed the good practice guidelines so my recorded steps do not increase unnecessarily.

Attaching the solution file here as well

Regards

Hi @BrianJ and @Melissa

Below is my explanation…

Summary

Initially it took me a while to write first 2 lines of the M code as I wanted to clean my data with minimum steps( didn’t followed what’s there initially in the sample file). People should read the blog by Chris Webb to deep dive about csv.document . Also I added ExtraValues.Ignore to help for unwanted data to be removed and we don’t get error.

= Csv.Document(File.Contents( FileLocation ),{"ColHeader", "Data"}, ":", ExtraValues.Ignore ,1252)

Now after loading the data with two columns ( ColHeader and Data ) then i cleaned each column with different set of character specific to the column. In ColHeader I trimmed comma(,) and spaces(" ") which helped me in next step as I had to filter few data. In Data column I trimmed all the possible combination i.e. spaces, brackets, comma.

= Table.TransformColumns(Source,{{"ColHeader", each Text.Trim(_ ,{","," "}), type text},{"Data", each Text.Trim( _ ,{" ",",","[","]"}), type text}})

After this few steps was to set a proper header so I filtered the required data, then proper text for ColHeader and also added custom column to replace values such as Alpha_2 …

From here I took the list of headers which I will be using later to club the set of row into table. Here I didn’t use fixed number to capture new headers in future. Added Index column to divide it with the count of Headers used M function Number.IntegerDivide. Next was grouping and I keep on applying rest of my process on the table itself without expanding the data( this is the approach I generally follow but need to see the performance too).

After all other transformation I loaded the data but I keep on track to check how much data I am loading and when i saw it showed me around 3MB of data was loaded and was surprised as the actual data was only in KB, so went back to optimize it and the first thing is adding Buffer at headers which is most common cause of loading data repeatedly. So added List.Buffer function

= List.Buffer(List.Distinct(#"Added Custom"[Header]))

And also added to next step as I was referring the previous table.
= Table.Buffer(#"Added Custom")

And loaded the data again, it worked faster. And then changed the data type for Flag to Image URL to get the country flag.
image

And added the matrix as per challenge requirement.

3 Likes

Hi @Sue,

Apologies for getting back to you so late.

Really appreciate that you’ve taken the time to share your method for solving this one with the community and that “rubber ducking” proved helpful in reaching the end result. :+1:

Thanks for participating and just want to add that I like how you worked through it :clap: :clap: :clap:
All the best

1 Like

Hi @AnshP,

Thanks for sharing!

You are right, grouping repetitive cleaning transformation steps would have significantly reduced the number of steps in your Query. However more importantly future maintenance becomes far more simple and depending on the type of transformation and volume it can also increase performance.

Hi @Rens,

Thanks for entering this challenge, it’s nice to see so many different variations. :+1:

:thinking: Well technically you did write M for the Custom Column logic…

But was that a goal you’d set for yourself?
Would you have done it differently otherwise?

1 Like

Hi @MK3010,

Thanks for providing so much detail around your solution and subsequent additional actions.

I know our community will benefit from your excellent write up. :clap: :clap: :clap:
Everyone can examine the details around your solution more closely and can revisit your post in future, when faced with something similar.

Much appreciated Mukesh!

1 Like

Ah yes. I overlooked that one. Damn! :slight_smile:

1 Like

Here’s my solution:

Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Posted Dec 2020.pbix (2.8 MB)

The key steps for me were creating a grouping column (simple formula based on an index column), and then pivoting the data using the ‘Don’t Aggregate’ function in the Pivot column Advanced options . The grouping column keeps the groups of data together while pivoting. After that it’s just tidying up.

Nice problem - thanks for the challenge.

Chris

4 Likes

@Melissa,

Here’s how I did it:

Summary
  • Added an index column
  • Added a Modulo column off the previously built index column
  • Pivoted the data on the Modulo column
  • Applied Fill Up to all the rest of the columns
  • Removed all the unneeded rows and columns
  • The rest was just formatting cleanup

I’m sure the final cleanup step can be done more efficiently than the way I did it.

Fun problem – shows how PQ can process a task incredibly quickly that would be a enormously burdensome and time-consuming otherwise.

1 Like

Hi @chrish,

Welcome to the community and kudos for joining in the Problem of the Week challenge :+1:

Even though, to date, none of the solutions is the same, I’ve seen some transformations in your Solution that I haven’t seen before (at all) - absolutely love that. Just shows that there are so many ways to solve a problem in Power Query.

Great job :clap: :clap: :clap: