Data Model - Split Fact table depending on cardinality and frequent use?

I’m designing a data model for service desk analytics. Some columns that describe the service issue (e.g., description) and its resolution (e.g., close notes, comments, user-facing notes) are extremely high cardinality and contain a lot of text.

Especially since those columns are only used in 5% of visuals, I’m wondering if it might make sense and be workable to split the fact table. A second fact table would contain only the keys for those high-cardinality, infrequently-used columns. The primary fact table would contain the keys and degenerate dimensions that are used often and are generally low cardinality.

I’ve never seen this pattern mentioned in all my study of data modeling, so I suspect there’s a flaw I’m not thinking of.

Could experienced data modelers provide feedback on this idea?

Hi @JulieCarignan - Please find the response generated by Data Mentor platform. Check if this is helpful.

Thanks
Ankit J

Data Model Design for Service Desk Analytics

Problem Analysis

You are designing a data model for service desk analytics. There’s a specific challenge regarding high-cardinality text columns related to service issues and their resolution, which are used infrequently in visuals. Your idea is to split the fact table into two: one for frequently-used data and another for high-cardinality, infrequently-used data.

Proposed Solution

Splitting the fact table can be a practical approach, particularly for optimizing performance. However, certain considerations must be made to avoid potential pitfalls and ensure the model remains both efficient and maintainable.

Solution Analysis

  1. Primary Fact Table: Contains keys and commonly-used, low-cardinality columns (e.g., service issue ID, date, priority).
  2. Secondary Fact Table: Contains keys and high-cardinality text columns (e.g., description, close notes, comments).

Considerations

  1. Performance:
  • Optimization: Splitting the table helps minimize the load on frequently-used data, improving query performance for most visuals.
  • Storage: Managing storage more effectively by segregating large text fields that inflate the primary fact table size.
  1. Complexity:
  • Joins: Adding complexity through additional joins when accessing detailed text data.
  • Query Complexity: Requires careful handling in query design to ensure accurate and efficient data retrieval.
  1. Best Practices:
  • Ensure consistency in keys between both fact tables to facilitate seamless joins.
  • Maintain clear documentation to aid in understanding the data model structure.

Thank you, Ankit.

I’d also like to hear from actual data modelers who may have real world experience to bear on the feasibility of the proposed idea.

Hi @JulieCarignan,

Sorry, don’t have a straight answer for you, just want to share my 2 cents…

In general, high cardinality columns impact model size and performance. Therefore, it is always worth considering if cardinality can be reduced. Strategies to accomplish that focus on splitting, grouping, or categorizing data. For example, if you have a column with a large number of distinct values, such as datetime, you can split this column separating dates from times to lower cardinality. Another common example is data that can be binned, grouped or categorized to bring similar values together into a smaller set of unified labels or identifiers. These actions can help simplify the data and improve query performance.

However, from your description I understand that this is not applicable to your case. Although it is always worth discussing with stakeholders what level of detail is actually required…

.
Furthermore, it is important to realize that tables in DAX are always expanded. If you filter a table using CALCULATE, you also filter all of the related ones, following a many-to-one pattern.
Depending on how you plan to build the model this fact will have to be considered.

Conceptually, I like the idea, but I have concerns.
I hope this is helpful.

Adding my comments here - you are describing something that appears in a Snowflake dataset, and as I’m sure you’ve seen in your various studies - that’s not considered the most ideal for Power Bi.

However, that doesn’t mean that we can’t use a snowflake model. I have been forced to this model in more than one dataset, and it does occasionally present challenges. Example, I’m currently cleaning a model created by a former Power Bi designer, and the refresh on that report is horrible - he broke everything into sub tables and it’s absurd.

  • everything @Melissa said above, her advice is invaluable when it comes to data modeling, and I’m not kidding! (I know I’ll be looking into the SQLBI link)
  • If you are developing this with other report modelers or designers in mind, they ALL need to understand how and when to use the extra table.
  • hide your connecting elements from the sub-table (the ID you will be joining on) - to prevent accidentally trying to bring the wrong info into your visuals

One thing to consider is developing a primary report that doesn’t need the additional table, then creating a second report, using the first as your dataset and adding the detail table into the mix.

1 Like