When doing ETL using Power Query, is it best practice to replace all nulls and blanks/empty strings with a word such as “Unspecified”? I assume this would be done in a cleaning query that is referenced by both the downstream fact table and dimension tables.
Or should only blanks be replaced?
And is it better that they be replaced with nulls or with a word such as “Unspecified”?
In my dataflows, replacing all blanks and nulls results in thousands of “Unspecified” values repeated throughout the dataset, most of which eventually get removed anyway when the dimension tables get de-duplicated before adding the index column. Is that no problem due to the efficiency of the Vertipaq engine?
Note about the blanks in my data:
One of my primary data sources contains help desk tickets that I get from a data lake (SQL database to which I don’t have write access). Some ticket fields aren’t populated until later in the ticket’s lifecycle, like its Status and which group and technician it’ll be Assigned to. Other fields are sometimes never populated such as the Category, Configuration or other fields that are irrelevant to certain types of tickets or issues.
So it’s not a matter of getting data from a spreadsheet that has nulls or blanks because of merged cells or values that could be filled down.
Do you know of any downsides or cautions related to following that practice?
I hoped to find this answer in the Definitive Guide to Power Query but couldn’t find an answer there.
Julie