Problem of the Week #2 (PQ) Create a Dimension Table from a Text file

Hi @Melissa and @BrainJ

Here is my solution…

4 Likes

Hi @MK3010,

Wonderful solution, nice combination of the UI and some M coding.
Thanks so much for joining in on the challenge!!

Feel free to already share a brief write up on how you worked through the challenge and what decisions you made during that process. But don’t forget to use the “Hide Details” or “Blur Spoiler” options (like you did in your submission) :+1:

Hope to see you in the next one.

Here’s a copy of the solution for Problem of the Week. The process I used is to break down each of the required changes step by step and building on your rubber duck suggestion @BrianJ I have my own tortoise to explain what I’m trying to do.
Once I work out what I want to do, I then look at the Power Query pages if I’m not sure of how to create that transformation. I also find that if I am able to verbalise exactly what I want to achieve it makes my google searches so much more effective.

solution|690x391

3 Likes

Hi there,

Cost me some time and it is not the most elegant solution, but I tried to build it without using a single line of M-code. And it worked! (Update: as Melissa points out, I indeed used an M for the custom column - will see if I can change this)
PS: I really like how the flags are used.

Thanks for this great problem.

Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Posted Dec 2020 _ Rens.pbix (2.9 MB)

1 Like

Hi @Melissa,

The process I took was to use the pivot column functionality that required a custom index. Once that was done, it was a matter of cleaning the data, joining the flag column, etc. When I reviewed my code, I realized that many of these transformations were repetitive. Hence, I could have followed the good practice guidelines so my recorded steps do not increase unnecessarily.

Attaching the solution file here as well

Regards

Hi @BrianJ and @Melissa

Below is my explanation…

Summary

Initially it took me a while to write first 2 lines of the M code as I wanted to clean my data with minimum steps( didn’t followed what’s there initially in the sample file). People should read the blog by Chris Webb to deep dive about csv.document . Also I added ExtraValues.Ignore to help for unwanted data to be removed and we don’t get error.

= Csv.Document(File.Contents( FileLocation ),{"ColHeader", "Data"}, ":", ExtraValues.Ignore ,1252)

Now after loading the data with two columns ( ColHeader and Data ) then i cleaned each column with different set of character specific to the column. In ColHeader I trimmed comma(,) and spaces(" ") which helped me in next step as I had to filter few data. In Data column I trimmed all the possible combination i.e. spaces, brackets, comma.

= Table.TransformColumns(Source,{{"ColHeader", each Text.Trim(_ ,{","," "}), type text},{"Data", each Text.Trim( _ ,{" ",",","[","]"}), type text}})

After this few steps was to set a proper header so I filtered the required data, then proper text for ColHeader and also added custom column to replace values such as Alpha_2 …

From here I took the list of headers which I will be using later to club the set of row into table. Here I didn’t use fixed number to capture new headers in future. Added Index column to divide it with the count of Headers used M function Number.IntegerDivide. Next was grouping and I keep on applying rest of my process on the table itself without expanding the data( this is the approach I generally follow but need to see the performance too).

After all other transformation I loaded the data but I keep on track to check how much data I am loading and when i saw it showed me around 3MB of data was loaded and was surprised as the actual data was only in KB, so went back to optimize it and the first thing is adding Buffer at headers which is most common cause of loading data repeatedly. So added List.Buffer function

= List.Buffer(List.Distinct(#"Added Custom"[Header]))

And also added to next step as I was referring the previous table.
= Table.Buffer(#"Added Custom")

And loaded the data again, it worked faster. And then changed the data type for Flag to Image URL to get the country flag.
image

And added the matrix as per challenge requirement.

3 Likes

Hi @Sue,

Apologies for getting back to you so late.

Really appreciate that you’ve taken the time to share your method for solving this one with the community and that “rubber ducking” proved helpful in reaching the end result. :+1:

Thanks for participating and just want to add that I like how you worked through it :clap: :clap: :clap:
All the best

1 Like

Hi @AnshP,

Thanks for sharing!

You are right, grouping repetitive cleaning transformation steps would have significantly reduced the number of steps in your Query. However more importantly future maintenance becomes far more simple and depending on the type of transformation and volume it can also increase performance.

Hi @Rens,

Thanks for entering this challenge, it’s nice to see so many different variations. :+1:

:thinking: Well technically you did write M for the Custom Column logic…

But was that a goal you’d set for yourself?
Would you have done it differently otherwise?

1 Like

Hi @MK3010,

Thanks for providing so much detail around your solution and subsequent additional actions.

I know our community will benefit from your excellent write up. :clap: :clap: :clap:
Everyone can examine the details around your solution more closely and can revisit your post in future, when faced with something similar.

Much appreciated Mukesh!

1 Like

Ah yes. I overlooked that one. Damn! :slight_smile:

1 Like

Here’s my solution:

Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Posted Dec 2020.pbix (2.8 MB)

The key steps for me were creating a grouping column (simple formula based on an index column), and then pivoting the data using the ‘Don’t Aggregate’ function in the Pivot column Advanced options . The grouping column keeps the groups of data together while pivoting. After that it’s just tidying up.

Nice problem - thanks for the challenge.

Chris

4 Likes

@Melissa,

Here’s how I did it:

Summary
  • Added an index column
  • Added a Modulo column off the previously built index column
  • Pivoted the data on the Modulo column
  • Applied Fill Up to all the rest of the columns
  • Removed all the unneeded rows and columns
  • The rest was just formatting cleanup

I’m sure the final cleanup step can be done more efficiently than the way I did it.

Fun problem – shows how PQ can process a task incredibly quickly that would be a enormously burdensome and time-consuming otherwise.

1 Like

Hi @chrish,

Welcome to the community and kudos for joining in the Problem of the Week challenge :+1:

Even though, to date, none of the solutions is the same, I’ve seen some transformations in your Solution that I haven’t seen before (at all) - absolutely love that. Just shows that there are so many ways to solve a problem in Power Query.

Great job :clap: :clap: :clap:

Hello there,

So here goes my solution for the 2nd week.

Indeed an awesome challenge. Thanks for this @Melissa. :slight_smile:

Not much but, yeah used a little bit of M, however, the problem could also be solved entirely through the GUI.

Unlike @Sue and @BrianJ, I don’t have a rubber duck or, tortoise to help me out in the thought process. However, I got a rubber orange that keeps me motivated. :joy: :joy: :joy:

Really can’t wait to how other guys approached and solve the problem. :100: :+1:

3 Likes

@quantumudit,

Very nicely done, for the second week in a row. The only disadvantage I see to your approach is that your orange can’t play the bagpipes… :stuck_out_tongue:

  • Brian
1 Like

:joy::joy: @BrianJ True, but it can listen to the “Analytic Mind” podcasts on Spotify. :wink:

2 Likes

Here goes my solution for this challenge. It was fun playing with the data.

Enterprise DNA Problem of the Week #2 – AlexBadiu.pbix (2.8 MB)

Summary

Summary

  • Started by keeping only the data needed for building the requested table
  • I cleansed the data https:// By looking at the site structure I realized that only the last characters change. used this finding to reconstruct the site address and clean undesired characters from the other rows in the same time
  • Did the cleansing needed for columns 1: Trim, Capitalize word, replaced undesired characters, renamed alpha_2
  • Build an index needed to pivot the data

Best regards

4 Likes

And here is my entry for this challenge - I’m going to have to check out everyone else’s comments to see how it compares, but I was able to use a couple of small tricks. Most of it was completed using the buttons in the UI, but there were a couple of small tricks I added.

One of my favorite tricks is this part:

Summary

To remove the extra characters in the value field, I created a step that was added to my code in the advanced editor, just after the let statement:
CharsToRemove = List.Transform({34, 91, 93}, each Character.FromNumber(_)),
And a few steps later, I used this to remove the characters of " [ ] from the text
#“Cleaned Text” = Table.TransformColumns(#“Add Merged”,{{“Column1”, Text.Clean, type text}, {“Column2”, Text.Clean, type text}, {“Column3”, Text.Clean, type text}}),
here is where you can go to get the DEC codes to use for this type of trick: https://ascii.cl/htmlcodes.htm

Enterprise DNA Problem of the Week #2 – Heather Solution.pbix (2.9 MB)

4 Likes

Thank you @BrianJ, @quantumudit, @alexbadiu and @Heather for entering and sharing your thoughts :+1:

I’m loving this!

Honestly it’s a real pleasure analyzing all submissions. Seeing the different mixes of UI and M code, being applied - just inspiring. It illustrates how many possibilities PQ offers to generate the same result.

6 Likes