Problem of the Week #2 (PQ) Create a Dimension Table from a Text file

quantumudit · December 18, 2020, 8:42pm

Hello there,

So here goes my solution for the 2nd week.

Week02_Dimension Table from Text_Power Query.pbix (2.8 MB)

Indeed an awesome challenge. Thanks for this @Melissa.

Not much but, yeah used a little bit of M, however, the problem could also be solved entirely through the GUI.

Unlike @Sue and @BrianJ, I don’t have a rubber duck or, tortoise to help me out in the thought process. However, I got a rubber orange that keeps me motivated.

Really can’t wait to how other guys approached and solve the problem.

BrianJ · December 18, 2020, 11:40pm

@quantumudit,

Very nicely done, for the second week in a row. The only disadvantage I see to your approach is that your orange can’t play the bagpipes…

Brian

quantumudit · December 19, 2020, 4:01am

@BrianJ True, but it can listen to the “Analytic Mind” podcasts on Spotify.

alexbadiu · December 19, 2020, 2:07pm

Here goes my solution for this challenge. It was fun playing with the data.

Enterprise DNA Problem of the Week #2 – AlexBadiu.pbix (2.8 MB)

Summary

Started by keeping only the data needed for building the requested table
I cleansed the data https:// By looking at the site structure I realized that only the last characters change. used this finding to reconstruct the site address and clean undesired characters from the other rows in the same time
Did the cleansing needed for columns 1: Trim, Capitalize word, replaced undesired characters, renamed alpha_2
Build an index needed to pivot the data

Best regards

Heather · December 19, 2020, 7:00pm

And here is my entry for this challenge - I’m going to have to check out everyone else’s comments to see how it compares, but I was able to use a couple of small tricks. Most of it was completed using the buttons in the UI, but there were a couple of small tricks I added.

One of my favorite tricks is this part:

Summary

To remove the extra characters in the value field, I created a step that was added to my code in the advanced editor, just after the let statement:
CharsToRemove = List.Transform({34, 91, 93}, each Character.FromNumber(_)),
And a few steps later, I used this to remove the characters of " [ ] from the text
#“Cleaned Text” = Table.TransformColumns(#“Add Merged”,{{“Column1”, Text.Clean, type text}, {“Column2”, Text.Clean, type text}, {“Column3”, Text.Clean, type text}}),
here is where you can go to get the DEC codes to use for this type of trick: https://ascii.cl/htmlcodes.htm

Enterprise DNA Problem of the Week #2 – Heather Solution.pbix (2.9 MB)

Melissa · December 20, 2020, 12:15pm

Thank you @BrianJ, @quantumudit, @alexbadiu and @Heather for entering and sharing your thoughts

I’m loving this!

Honestly it’s a real pleasure analyzing all submissions. Seeing the different mixes of UI and M code, being applied - just inspiring. It illustrates how many possibilities PQ offers to generate the same result.

Rajesh · December 20, 2020, 8:24pm

@Melissa

I did some changes to my previous one to increase the performance and also reduced number of steps from 16 to 12.

Summary

Problem of the Week #2 – Solution 2 - Rajesh.pbix (2.8 MB)

amira.bedhiafi.pro · December 20, 2020, 11:27pm

Howdy fellas !
Find attached my solution. Well it may be not that optimized but these are the main steps :

I transformed my data into a real table using Lines.FromBinary(File.Contents(…),QuoteStyle.CSV, null, 1252),

And guess what ? For transposing data :

I added an index column starting at 1
I divided that column by the # of rows of data per country and round up to the nearest integer.

Here is my work :
Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Amira Bedhiafi.pbix (2.9 MB)

logicalbi · December 21, 2020, 5:38pm

Hi,

Please find my solution. I dabbled a bit with M code as well as using the GUI.

Summary

In summary:

Filtered out rows not needed
Conditionally combined columns, if applicable
Conditional format to remove trailing commas (Text.EndsWith)
Used Index and Modulo to create a grouped index
Pivot by cleaned attribute column

Enterprise DNA Problem of the Week #2 – Chris Healey.pbix (2.8 MB)

jbressan · December 21, 2020, 5:52pm

This is my submission Problem of the Week #2:

Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Posted Dec 2020.pbix (2.9 MB)

Melissa · December 22, 2020, 7:35am

Hi @Rajesh,

Great improvements.
I think this might be the most creative one…

.

Hi @amira.bedhiafi.pro,

Wonderful submission.
And great to see you found a solution after getting stuck initially.

.

Hi @logicalbi,

Welcome to the forum! Kudos for jumping straight into this challenge.
I hope you enjoyed it and to see more of your work in future POTW iterations .

.

Hi @jbressan,

You’ve sure been keeping busy. Entering Power BI Challenge 10- Supplier Insight and both POTW challenges this month. Just amazing. Thank you.

MudassirAli · December 23, 2020, 5:15am

Here is my solution for POTW Challenge # 2. Thank you @Melissa for coming up with such a great challenge. Now looking forward to your solution .
I just used the user interface for the solution.

Enterprise DNA Problem of the Week #2 – Mudassir Ali.pbix (2.8 MB)

Melissa · December 23, 2020, 7:09am

Hi everyone,

Wow this has been amazing and so much fun! Thank you eDNA community!
You guys are turning this new initiative into a huge success.

In the YouTube Solution video I discussed the general approach I have taken in solving this problem. All specific details on my submission you can find in this post.

I hope you enjoy reading . Get ready… here I go.

What to do when you get stuck?
Start “Rubberducking” - verbally articulating your general strategy before diving into coding.

Search within the Documentation. M is a functional language and the MS Docs can be a big help in finding functions, examining their syntax and review examples by copy and pasting them into a new blank Query.
I’ve got it bookmarked and second to the Forum it’s the site I visit most.

.

However if you rather have direct access via your External Tools menu, see the post below.

.
Personally I’ve learned a lot by just Googling… one of the major parts in this challenge revolved around Pivoting Stacked data. Now there are a couple of ways to do this of course but what if you’ve never seen this ‘pattern’ before? Knowing (or guessing ) that this format is known as Stacked can help you with some research.
.
See how many hits you get when you ask Google for anything on: “stacked data in power query”

.

Review my full PQ solution

On to my solution.
I tent to design my queries using UI as much as I can and when the query kind of does what it needs to do then I’ll go into the Advanced Editor examine the code and see if I can modify/change it.

In my view, this data offered a lot of opportunities to experiment with some actual M coding. You can find all of my notes on that in the Countries (full UI) with Comments query.

And when I was done, this was the end result.

.

Starting from line 4 where I invoked a custom Text Cleaning function (for anyone interested in that a YouTube video will become available some time next week). This is the M code for the fxCleanText function.

(myText as text) as text =>
  let
    ToRemove     = {""""} & Text.ToList(":[]}"),
    CleanText    = Text.Remove(myText, ToRemove),
    TrimText     = Text.TrimEnd(Text.Trim(CleanText), ","),
    FixLink      = Text.Replace(TrimText, "//", "://"),
    ReplaceValue = Text.Replace(FixLink, "_", " ")
  in
    ReplaceValue

.
Which can then be Invoked to Transform both columns in a single step.

.

Filtering Rows

.

Creating segments to Pivot the data (aka identifying each separate block of stacked data)
Nested 2 functions here and only returned the 3 Columns mentioned in the square brackets.

And the Pivot

.

Renamed the Columns

Table.RenameColumns( PivotCol1,
  let
    nNames = List.ReplaceMatchingItems(
      List.Transform( 
        Table.FirstN(UpdateIndex, 20)[Column1], Text.Proper),
        {
          {"Km2", "Area Sq Km"},
          {"Mi2", "Area Sq Mi"},
          {"Alpha 2", "ISO 2 Code"},
          {"Alpha 3", "ISO 3 Code"},
          {"Tld", "TLD"}
        }
      ),
    cNames = Table.ColumnNames(PivotCol1)
  in
    List.Zip( {cNames, nNames} )
)

.

Format and assign Type to the Columns

Table.TransformColumns(
  RenameColumns,
  {
    {"Country", Text.Proper, type text},
    {"Capital", Text.Proper, type text},
    {"Currency", Text.Proper, type text},
    {"Native Language", Text.Proper, type text},
    {"Famous For", Text.Proper, type text},
    {"Alcohol Prohibition", Text.Proper, type text},
    {"Drive Direction", Text.Proper, type text},
    {"Constitutional Form", Text.Proper, type text},
    {"Continent", Text.Upper, type text},
    {"ISO 2 Code", Text.Upper, type text},
    {"ISO 3 Code", Text.Upper, type text},
    {"Neighbors", Text.Upper, type text},
    {"Area Sq Km", Number.From, type number},
    {"Area Sq Mi", Number.From, type number},
    {"Is Landlocked", Logical.From, type logical}
  }
)

.

To do some more additional clean up on the “Famous for” column, created another function.

(myText as text) as text =>
  let
    ToList = Text.Split(myText, " "),
    Replacements = List.ReplaceMatchingItems( ToList,
        { {"And", "and"}, {"The", "then"}, {"Of", "of"}, {"To", "to"}, {"In", "in"} }
      ),
    CombineText = Text.Combine(List.Transform(Replacements, Text.From), " "),
    NewText = Text.Replace(CombineText, "'S", "'s")
  in
    NewText

.
and finally Invoked that - All done (in Power Query )

.

Nobody seems to have had any issues displaying the Flag in the Table, I remember being amazed the first time I saw that little trick.

.
Here’s my file: Problem of the Week #2 – Dimension Table from Text Solution (Melissa).pbix (2.9 MB)

If you have any questions in regards to my solution please reach out and @mention me.
.

Things I noticed or want to call out after reviewing your submissions.

For both @chrish and @logicalbi participating here was their first Forum contribution
On the number of query steps… @BrianJ required the most 45 in total, @Rajesh 's second entry the least with just 15 steps (note I’m incl. let and in clause) followed by @chrish with 17 and @alexbadiu with 19 steps.
I thought the change @chrish made to the Source step was pretty slick.

Both Hossein Satour and @amira.bedhiafi.pro also used a combination with Lines.FromBinary in the Source step - Can you guys comment on how/why you decided to do that?
Thanks!
Several methods were used for segmenting and setting up the Pivot step:

if-then-else constructs like: if [Column1] = “Country” then [Index] else null
Divisions like: Number.IntegerDivide([Index], 22) or [Index]/10 or each _ / 22
Modulo like: Number.Mod([Index], 20)

Not everyone chose to do a Pivot to get to the tabular format other methods used:

Nested table Transpose by @MK3010 and Hossein Satour
#table construct by @Rajesh

@Rajesh created an epic Text Cleaning function
and last but not least
@MK3010 did an excellent write up so I recommend reviewing that, if you haven’t already.

.
If you have more to share on your own entry I would enjoy reading AND appreciate that

As you know to encourage participation, we do a random drawing of five participants who will receive a cool Enterprise DNA vinyl laptop sticker. This round’s winners, in alphabetical order, are:

To claim your price please make sure to send your details to this email address: brian.julius@enterprisedna.co

That about sums it up for me… If you went exploring too and feel I missed any of the hidden gems, please feel free to leave your thoughts in a post below. Thanks!

Hope to see you soon in new POTW challenge!

All the best.
Melissa

Melissa · December 23, 2020, 7:24am

Hi All,

We also had a non-member submission by Hossein Satour this time.
Here’s the file if you want to go explore and that on your own.

POTW #2 – Solution by Hossein Satour.pbix (2.9 MB)

.

Hi @MudassirAli,

Great one! You didn’t have to wait long…
Hope you enjoy it

BrianJ · December 23, 2020, 8:47am

@Melissa,

Thanks -terrific job leading this week’s problem, including an excellent writeup and solution video.

With regard to this section of your analysis:

My rubberducking session for this problem went something like this:

Add index
Add modulo
Pivot with no aggregation on modulo
Fill up
Filter non-Country rows
Make an inefficient mess of the cleanup process

So, mission accomplished!

As soon as I started reviewing the other solutions coming in, I realized that creating the tabular form BEFORE doing the cleanup was not a wise decision on my part. Beyond that, learned a lot of innovative techniques for text cleaning from the solution submitted.

Huge thanks to all who participated. We hope you are finding these problems both educational and entertaining, and we absolutely welcome your feedback.

Because December has five Wednesdays in it this year, no new POTW next week, but we will be back on the first Wednesday of January (1/6) with a new DAX problem, led by Enterprise DNA expert @greg who has cooked up another good one for you.

Brian

P.S. if you were selected in this round of the drawing for the laptop stickers, please email me your mailing address at brian.julius@enterprisedna.co, and I will get your sticker mailed off to you right away.

Melissa · December 23, 2020, 9:37am

Hi @BrianJ,

Thanks for sharing your thoughts and problem break down

But I don’t think that’s the actual issue…

In my view grouping similar transformations as much as you can, in general will reduce the number of query steps. Looking at your solution for example there’s a lot of duplication in steps:

9x Replace Value (through just the UI this is the only one that can’t be reduced to a single step)
8x Rename Columns
5x Capitalize Each Word
4x Change Type
2x Remove Columns

Removing this type of duplication will have a significant effect. Try it.
I’d love to hear the result…

EnterpriseDNA · December 30, 2020, 1:18am

Hi everyone,

Here’s a submission from one non-member, Sayali.

Summary of Steps -

Data cleaning

Create Index

Group

Pivot functions
PowerBI Challenge#2_SD.pbix (2.9 MB)
Challenge 2 Screenshot1251×704 96.8 KB

Thanks

Melissa · January 3, 2021, 2:10pm

In the first YouTube Solution video I discussed the general approach I have taken in solving this problem. All specific details on my submission and a link to that first video - you can find in my earlier post.

The second video related to this challenge was released today, here’s a link to that. It covers the Custom Text Cleaning function with more detail. I hope you enjoy it.

sprmnt21 · April 27, 2021, 5:02pm

I wanted to try my hand at this nice problem too.
I have tried, as much as possible, to use the UI. for this reason the steps of the solution are more than strictly necessary.

the idea is developed according to the following points:

cleaning of the text;
aggregation of some subfields (Area; Iso; …)
group the data relating to each variable into lists
use the first column of this table as a list of the names of the searched table and the second column as a list of the columns of the searched table.

Enterprise DNA Problem of the Week #2 – Dimension Table from Text - Posted Dec 2020 sprmnt21.pbix (2.9 MB)

Melissa · April 27, 2021, 10:38pm

Hi @sprmnt21,

Like what you have done here.
Demonstrates another creative solution to a problem like this. Thank you so much for participating!
Hope to see you in future challenges.