How to generate surrogate keys for SCD fast in Power BI

Martin · August 28, 2020, 1:10pm

Hi @adesinamk,
Thank you for looking into this issue. My first apporach was pretty close to your solution, including creating an index column in the dimension table, which btw. breaks query folding so I would not recommend it for large datasets loaded from a SQL-Server.
The issue with my solution is that it makes loading the data much slower than a solution that folds to the SQL-Server.
Since I’m dealing with slowly chaging dimensions that have valid from and valid to date columns your solution is not applicable to my problem. Your solution suggests to merge columns to generate a column with unique values in both tables that can be used for a merge. My dimension records look e.g. like this:
… | Person ID | Valid From | Valid To
… | 12345 | 2018-10-12 | 2020-05-18
And my fact records look like this
… | Fact Date | Person ID
… | 2019-03-12 | 12345
99% of the facts do not occure at the same date at which, by chance, also a dimension value changes, so I cannot merge the surrogate keys to the fact table based on a concatenated column create from Fact Date and Person ID. So which columns should I use? That’s why I ended up not using the Power Query table merge to add the surrogate key to the fact table, but instead I use a lookup function. And that’s where loading starts getting slow. Is there a way to speed up loading? Either the Power Query engine is ineffictive in general or my solution is ineffective by it’s specific design, but the SQL server does the same job in a fraction of time. And not because it’s a faster machine. I run SQL-Server and Power Query on the same machine in my development environment.
BR
Martin