Extra column for very rare data?

Mjuds · July 12, 2021, 8:20am

Hello.

I’m in the very early phase with a new customer, so I don’t have any model yet.

What I found out so far is, that my customer nearly always sends his invoices to his customer’s address as it is mentioned in the ‘Customer Table’.

But in some rare cases he needs to send it to another address.
These rare cases are dependent on the project.

So my idea was (and currently is) to add a column to the ‘Project Table’, indicating the ‘invoice to address’.
But this column will be empty in 99% of the cases, because the ‘invoice to address’ is the same as in the ‘Customer Table’.
I find it a waste of table space to always add a value which is not needed.

Is there a smart way around this.

Greetings
Martin

DavieJoe · July 12, 2021, 8:33am

Hi @Mjuds

realise you don’t have a model yet so this would be difficult to work out in terms of guess work. In my head you would need some way of identifying when the invoice goes to an alternate address, so in Power Query you could build in some logic that says if certain conditions exist then populate with Alternate Address if not use Invoice Address. This way you don’t need to have an additional column but your invoice column populates with wherever the invoice needs to go…

This make sense?

David

BrianJ · July 12, 2021, 9:07am

@Mjuds ,

I wouldn’t worry much about adding a column that is 99% blank. In terms of space consumed in Power BI, what primarily matters is not the number of rows nor the number of columns, but the cardinality of those columns (i.e., the number of unique values in that column). High cardinality columns take up a lot of space, whereas the Vertipaq compression engine in Power BI does a great job of minimizing the space taken by low cardinality columns (i.e. ones where 99% of the values are duplicates – in this case blanks).

It sounds like for those 1%, this information will be absolutely necessary so including a second address column will be a good strategy, with low cost in terms of additional space required.

Note: in Power BI, cardinality can refer to two totally different concepts. When talking about tables, cardinality addresses whether the relationship between the tables is 1:1, 1:M, M:M. When talking about columns it refers to the number of unique values.

I hope this is helpful.

– Brian

Mjuds · July 12, 2021, 2:49pm

Hello Brian,

that’s exactly the answer I was hoping for.
Good to know how the cardinality affects the needed space.

Thanks a lot, also to David for his answer.
Martin

BrianJ · July 12, 2021, 3:04pm

@Mjuds ,

Great - glad to hear that got you what you needed.

Just as an FYI re: PBI storage efficiency - this is a pretty heavy read, but the TL;DR conclusion is pretty amazing - they fit 200 million rows in an 18MB PBIX file.

https://towardsdatascience.com/power-bi-how-to-fit-200-million-rows-in-less-than-1gb-5eee310064b7

Brian

DavieJoe · July 12, 2021, 3:11pm

No worries Martin, @BrianJ’s wonderfully succinct answer helped me too. I simply thought of “how to would solve your problem” when another column wasn’t a problem at all.

Mjuds · July 13, 2021, 7:48pm

Hi Brian,

I’m always interested to learn something new, although it might be hard stuff.

But in this case it’s somehow no stuff, because the link leads to the following message:

Just to inform you and the other interested people.
BR Martin

DavieJoe · July 13, 2021, 7:54pm

BrianJ · July 13, 2021, 9:27pm

@Mjuds ,

Cool trick I learned from a private investigator friend – when things get “deleted” off the Internet, you can frequently still read them using the Wayback Machine:

Brian