Referencing facts to the correct value of a slowly changing dimension with from/to dates

This is just for sharing and discussion, the task is already solved.

Use case
You have a slowly changing dimension (SCD), e.g. sales teams with team members that change their team from time to time. Then you have facts that reference the SCD, e.g. a sales table with references to the sales team member that did the sale. Now you want to aggregate facts correctly by the SCD, e.g. total sales by sales team. Then you have to correctly consider who was when in which sales team during the period.

Data structure
The SCD table has a valid from and a valid to date in each row. The fact table has a date column. Most dates in the fact table do not occur in the SCD table because they are in between change dates in the SCD table. So they cannot just be related.

Solution

  1. Add index column to SCD table
  2. Create user defined function in query editor that filters SCD table down to the correct validity period for each date from the fact table and returns the corresponding index value.
  3. Use this function to add index reference column to fact table.
  4. In data model, link the SCD index column with the facts index reference column.
    Now aggregation works correctly. The basic pattern is shown in the attached file.

ReferenceFactsFromDateToDate.pbix (94.2 KB)

Feedback on alternative solutions and performance aspects is highly appreciated.

3 Likes

@Martin,

:+1:. Really clear explanation and excellent illustration in the PBIX. Thanks very much for taking the time to put this together and post it.

  • Brian
1 Like

Hi @Martin,

Thanks for sharing this with the community! Really like your approach :ok_hand:

As for optimization I can only think of adding a Table.Buffer step because I don’t see a way around calling Table.SelectRows… To demonstrate I’ve added a SCD key column, like so:

let
    Quelle = Excel.Workbook(File.Contents(FileLocation), null, true),
    Facts_Table = Quelle{[Item="tFact",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Facts_Table,{{"Facts", type date}, {"value", Int64.Type}, {"DimensionIndex", Int64.Type}}),
    TableBuffer = Table.Buffer(Dimension),
    #"SCD key" = Table.AddColumn(#"Changed Type", "SCD key", each List.First(Table.SelectRows(Table.SelectRows(TableBuffer, (f)=> f[from] <= [Facts] or f[from] = null ), (t) => t[to] >= [Facts] or  t[to] = null )[Index]), Int64.Type )
in
    #"SCD key"

Let me know your findings.

Here are my files, note that if you update the FileLocation Parameter the queries will be restored.
eDNA - ReferenceFactsFromDateToDate.pbix (103.2 KB) SCD data.xlsx (12.2 KB)

1 Like

Thank you @Melissa for the idea! I’m still learning. That’s why we are all here :slight_smile:

1 Like

Meanwhile I learnt what is the fast solution:
You do a left outer join (plain vanilla merge in Power Query) of the surrogate key, valid from and valid to columns from the SCD table on the fact table. Then you add a custom column that marks the matching validity period by comparing the fact date with the valid from and valid to colums. Finally the fact table is filtered down to only the matching rows. Then the merged vaild from and valid to columns and the custom column for filtering can be removed again.

1 Like

To be honest, I do not fully understand what you mean.
Maybe you could add a sketch of the tables with connection lines between them.

I’m learning every day, but the starting point is a bit to far away for me.

Hi @Mjuds,
I’ve attached a template solution for you so you can see how to create and match the surrogate keys. Power BI SCD Type 2 Implementation.zip (653.3 KB)
The essential change is that you need a new relationship that takes - in case of this example - employee ID and date into account. Since Power BI, in contrast to other BI tools, supports neither multiple-column-relationships nor range-match-relationships you need to create and match these surrogate keys in PowerQuery so you have a single column in each table (dimension and facts) to create the relationship.
The data model looks like this, but I think it’s easier to understand when you look at the Power BI file: You don’t match SalesPersonID with BusinessEntityID (the ID of the employee) anymore but instead with the surrogate key that refers to the employee with the correct attribute set for that date of the fact.


For more information about the different ways how to deal with slowly changing dimensions, here and here is a good reading.
image
BR and good luck!
Martin

4 Likes