Best Practice Question - No. 2

Hello,

As suggested by @BrianJ, I am presenting another question concerning best practices to use:

Say I have 3 columns; “symbol” which might have repetitive values of different length and structure, “description” with data as above; now I would need to replace text in “description” labelled “IncorrectString” with a text labelled “CorrectString”; column “segment” indicates which string is correct and which is not. Both “correct” and “incorrect” correspond to the identical string in column “symbol”.

How do I tackle that ?

Best

RafalData_example_2.xlsx (9.8 KB)

@Rafal_J,

Before I jump into generating a solution, let me make sure I have correctly understood the problem. What you currently have is basically the two tables below, correct? And what you want is to generate the description field in table 2, based on conditional logic related to segment (e.g., if segment = 7, correct string, else incorrect string), and then join/replace the description field in table 1 with the results of the description field in table 2, correct?

image

Thanks.

– Brian

PS – it might be helpful to have a larger sample dataset to make sure I’m not inferring patterns based on a small sample that don’t hold for the larger dataset.

1 Like

Thats exactly what I meant to solve;
In Table 1 - AccID is unique, therefore Account is unique;
In Table 2 - Symbol is repetitive, Description is either correct or incorrect depending on value in Segment

Best

Rafal

@Rafal_J,

See how this solution works for you. Below is the final table I created with all the account dimension information in it. I provide two different ways to create a unique Account Key. The first is a simple approach that concatenates Account Key with segment number to create a unique ID.

The second is a trick I really like that uses an invisible nonprinting character to create a unique ID. You can see that even though looking at the column Account Key Unique 2, there appear to be duplicates, when you look at the length column you’ll notice that each pair of apparent duplicates has a different length, and thus Power BI views them as unique. If you use this approach, you’ll need to apply it to your fact table as well in order that you can match the appropriate record in the dimension table to the correct record(s) in the fact table.

For the Description field, I used a basic if/then/else M code structure based on the value of Segment.

Here’s the full M code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQ31DcyMDJQ0lEyBOLExETdyEjdqCjd0NBQ3bAwXT8giPfxiff3jw8IiPfwAKkzQFKtFKuDYogRkYYYoRoSCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, AccID = _t, Account = _t, Value = _t, Side = _t, Description = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"AccID", Int64.Type}, {"Account", type text}, {"Value", Int64.Type}, {"Side", Int64.Type}, {"Description", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Acct Key", each Text.Start( [Account], 3 ) &"_"& Text.From( [AccID] )),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Description"}),
    #"Merged Queries" = Table.NestedJoin(#"Removed Columns", {"Acct Key"}, #"Table 2", {"Symbol"}, "Table 2", JoinKind.LeftOuter),
    #"Expanded Table 2" = Table.ExpandTableColumn(#"Merged Queries", "Table 2", {"Segment"}, {"Segment"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Table 2", "Description", each if [Segment] = 7 then "CorrectString" else "IncorrectString"),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Acct Key Unique1", each [Acct Key] &"_"& Text.From( [Segment] )),
    #"Grouped Rows" = Table.Group(#"Added Custom2", {"Acct Key"}, {{"Data", each _, type table [Date=nullable date, AccID=nullable number, Account=nullable text, Value=nullable number, Side=nullable number, Acct Key=text, Segment=nullable number, Description=text, Acct Key Unique1=text]}}),
    #"Added Custom3" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Data], "Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Date", "AccID", "Account", "Value", "Side", "Acct Key", "Segment", "Description", "Acct Key Unique1", "Index"}, {"Date", "AccID", "Account", "Value", "Side", "Acct Key", "Segment", "Description", "Acct Key Unique1", "Index"}),
    #"Added Custom4" = Table.AddColumn(#"Expanded Custom", "Acct Key Unique 2", each [Acct Key] & Text.Repeat( Character.FromNumber( 8204 ), [Index] - 1)),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom4",{{"Index", Int64.Type}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Changed Type1", "Acct Key Unique 2", "Acct Key Unique 2 - Copy"),
    #"Calculated Text Length" = Table.TransformColumns(#"Duplicated Column",{{"Acct Key Unique 2 - Copy", Text.Length, Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Calculated Text Length",{{"Acct Key Unique 2 - Copy", "Acct Key Unique 2 - Length"}})
in
    #"Renamed Columns"

I hope this gets you what you need. Feel free to give me a shout if you have any questions. Full solution file attached below.

Hi @Rafal_J, did the response provided by @BrianJ help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!

Thank for your kind answer.
Now I need a while to understand it and test it.
If there would be anything I cannot figure out myself I’ll post.

Best

Rafal

What I would need is replacing “IncorrectString” with “CorrectString” in description column.
I am thinking about some conditional expression like:

  1. for given index value,
  2. take description text if value=7 in segment,
  3. replace description if value=5 in segment,
  4. with 2.

Up to this point I think I got it.

Best

Rafal

@Rafal_J,

If I understand this correctly, it’s a straightforward change:

#"Added Custom1" = Table.AddColumn(#"Expanded Table 2", "Description", each if [Segment] = 7 then "CorrectString_1" else 2),

Here’s the result:

Please let me know if this gets you the needed results. If not, please provide a mockup of the results you’re looking for, and I’m sure we can modify the solution as needed.

Full revised solution file attached.

@BrianJ,

Please find the mockup attached.
Worksheet Input shows what I am having, worksheet Result shows the desired results. The steps I should be taking are:

  1. Add column taking the first char from “SynthAcc”, - (this I can do) - say “SynthAccAbbr”

1comment. If SynthAccAbbr = 5 then “Description” is wrong and needs to be replaced
1comment. If SynthAccAbbr = 7 then “Description” is correct and needs to stay

How to replace it - I would describe concrete example

In case of “Segment” value = “00119_097_01_03_07” I can find four rows matching - R(4,6,31,32),
For these “Description” value = “Getin Bank; utrzymanie; projekt MOPS2” in R(31,32) and “Koszty działalności podstawowej” in R(4,6).

Now, R(4,6) “Description” is wrong and should be equal to “Description” from R(31,32)

In worksheet Result, value of “Description” in rows 4,6,31,32 is corrected.

I hope it makes sense, appreciate your time spent on this and many thanks.

Best

Rafal

Data_example_3.xlsx (17.2 KB)

@Rafal_J,

Sorry for the delay, had one in the queue in front of this one that proved trickier to crack than expected.

OK, I think I’ve got a working solution for you on this one. I’ve attached the full solution file, but here are the basic steps:

  1. Created the SynthAccAbbr column by splitting column by character count (1)
  2. This is the key step - grouped rows by Segment and SynthAccAbbr using AllRows:

This creates the folllowing set of nested tables:

  1. Now I split this table into two sets of nested tables by duplicating the table, then filtering the first on SynthAbbr = 5 and the second on SynthAbbr = 7. Renamed the resulting tables Incorrect and Correct, respectively.

  2. In both tables, removed all but the AllRows column and expanded the tables.

  3. In the Incorrect table, removed the Description column and then merged it back on from the Correct table using Segment as the key, and then removed duplicates after the merge.

  4. Appended the tables back together and renamed the resulting table “Result”. I compared this to your “Result” spreadsheet in the mockup data and the corrected descriptions all seem to match up.

I hope this is helpful. Please give a shout if you have any questions.

P.S. The one thing I don’t like about this solution is the creation of a separate table that we’re left with at the end (even though that table isn’t loaded to the data model). The cool kids all do this via custom M in a way that doesn’t leave this remnant. For my own education, when I have time I’ll try to work out a revised solution using that approach and will share with you. However, in the meantime, I think the solution provided totally works, even if it gets docked a few style points… :grinning:

2 Likes

Hi @Rafal_J, did the response provided by @BrianJ help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!

Hello @BrianJ ,

Thanks for your reply - greatly appreciated.

I have experimented with your solution and it seems that it’s working. I have some similar scenarios to solve as well. I believe the thought process behind it is most helpful.

I have also tried a different approach, namely building a dimension table with correct naming. It did also work for me. I am not sure yet which one is most beneficial.

Best

Rafal

@Rafal_J,

Excellent – really glad to hear that was helpful to you. If you run into any problems with your other scenarios, just give a shout.

– Brian