Referencing facts to the correct value of a slowly changing dimension with from/to dates

This is just for sharing and discussion, the task is already solved.

Use case
You have a slowly changing dimension (SCD), e.g. sales teams with team members that change their team from time to time. Then you have facts that reference the SCD, e.g. a sales table with references to the sales team member that did the sale. Now you want to aggregate facts correctly by the SCD, e.g. total sales by sales team. Then you have to correctly consider who was when in which sales team during the period.

Data structure
The SCD table has a valid from and a valid to date in each row. The fact table has a date column. Most dates in the fact table do not occur in the SCD table because they are in between change dates in the SCD table. So they cannot just be related.

Solution

  1. Add index column to SCD table
  2. Create user defined function in query editor that filters SCD table down to the correct validity period for each date from the fact table and returns the corresponding index value.
  3. Use this function to add index reference column to fact table.
  4. In data model, link the SCD index column with the facts index reference column.
    Now aggregation works correctly. The basic pattern is shown in the attached file.

ReferenceFactsFromDateToDate.pbix (94.2 KB)

Feedback on alternative solutions and performance aspects is highly appreciated.

3 Likes

@Martin,

:+1:. Really clear explanation and excellent illustration in the PBIX. Thanks very much for taking the time to put this together and post it.

  • Brian
1 Like

Hi @Martin,

Thanks for sharing this with the community! Really like your approach :ok_hand:

As for optimization I can only think of adding a Table.Buffer step because I don’t see a way around calling Table.SelectRows… To demonstrate I’ve added a SCD key column, like so:

let
    Quelle = Excel.Workbook(File.Contents(FileLocation), null, true),
    Facts_Table = Quelle{[Item="tFact",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Facts_Table,{{"Facts", type date}, {"value", Int64.Type}, {"DimensionIndex", Int64.Type}}),
    TableBuffer = Table.Buffer(Dimension),
    #"SCD key" = Table.AddColumn(#"Changed Type", "SCD key", each List.First(Table.SelectRows(Table.SelectRows(TableBuffer, (f)=> f[from] <= [Facts] or f[from] = null ), (t) => t[to] >= [Facts] or  t[to] = null )[Index]), Int64.Type )
in
    #"SCD key"

Let me know your findings.

Here are my files, note that if you update the FileLocation Parameter the queries will be restored.
eDNA - ReferenceFactsFromDateToDate.pbix (103.2 KB) SCD data.xlsx (12.2 KB)

1 Like

Thank you @Melissa for the idea! I’m still learning. That’s why we are all here :slight_smile:

1 Like