My first item is Sales and Products
Sales has 6,000,000 Rows
Product has 265,000
I am missing 1244 Rows in the produuct table that are in the Sales table
Assuming that I can t fix this in the data source is there a best practice for this?
I could Merge on a left join from the sales to the product and if there is none then use a default of -99?
As I am having issues with refresh times any complexity needs to be efficient.
@AntrikshSharma
Yes. I agree but surely if I add a whole load of lookups and corrections to my queries it will slow down the refresh. So in the query foir the 6 million rows I check to see if the key to product exists in the product table and replace with a dummy will it have
a big impact / a small impact ?
I am suspecting a considerable impact and possibly a very considerable impact if done badly
When connecting to a data source using DirectQuery , you can use the Assume Referential Integrity selection to enable running more efficient queries against your data source. This feature has a few requirements of the underlying data, and it is only available when using DirectQuery .
Setting Assume referential integrity enables queries on the data source to use INNER JOIN statements rather than OUTER JOIN , which improves query efficiency.
Please read the following article which can be useful:
@ells As far as I know only few kind of functions would invoke a complex behaviour, one of them is ALLNOBLANKROW, I will have to test other functions to see if the behaviour changes. But to handle these kind of issues, engine adds a blank row automatically to the 1 side of relationship.
Can you show the metric of your model using DAX Studio, if the problem is in refresh then either you have complex power query transformations or there are columns in your model that are huge and causing the slowness.
I think I have meanaged to resolve the big number of missing keys. The main issue was a join where there were 2+ M Rows that had a blank reference for the key to the dimension. The rest appears to be timing issues but will check this later tonight when I can check the refreshed data set.
There are still about four joins where there are missing references (around 100 missing keys) however this is so low key that it is acceptable.
Hi @ells, did the response provided by @AntrikshSharma and @IlgarZarbaliyev help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark as solution the answer that solved your query. Thanks!
I think there may be no answer to the question. I still think if I get the referential integrity resolved this will add time to the refresh that I do not need.
Due to the size I need query performance to be as good as possible.
I have no control over the data quality in the data warehouse.
I have looked at changing the foreign key in the fact tables that dont exist in the dimension table. I started to merge the fact and dimension to identify missing keys but unfortunatelly the refresh times then start to go to over 3 hours which is unaccepptable.
Either there is a better way than a left join merge and changing the key if there is no join or I will leave it as is.
As it is I think it reflects the poor data quality of the underlying systems.
I fully agree @ells. It reflects poor data quality. Is there a DWH in between? If yes, I hope the customer will forward this to the responsible colleauge. If not, and you are connecting directly from source, it might be interesting (but a big project) to look into a DWH option.
3 hours seem very long for a refresh. For a daily report it might be accepted if you run it nightly, but if you need to update throughout the day it is a red flag.
I assume you are on Power BI pro? I do not have experience in Power BI premium, but you might investigate that option.
Furthermore, there is another option, but will take some work. You might consider using Azure Analysis Service as a layer between your Power BI report and your database. This might help to speed up the process. Azure Analysis Service is then the place where you build your model. In AAS you can add more compute power so that the refresh goes faster. Afterwards, you query that model into Power BI…
Hi @ells, did the response provided by @rens help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark as solution the answer that solved your query. Thanks!
@Rens
This is on premium.
With regards to the responsible people I am not sure what to say. It is my experience that since people have decided to move to an agile method of development that the quality has dropped. Not saying Agile is bad, just sayin that sometimes bad stuff gets through the cracks. My aim is to keep the cracks as small and manageable as possible.
This is not the worse place I have seen for data quality.
Thanks
E
They are mainly 3 places you can make changes wrt to Power BI i.e. Source, Power Query and Power Pivot(DAX)
PS: For me, best option is to get it fixed as source especially if you have ETL setup available.
There will be some disadvantages with all of above and you can choose what suits you the best. As all options are already covered, suggest you to close this topic.
@ankit
Thanks. This has been a bit of a nightmare. No ETL, No chance to improve the quality of DWH data.
In the query editor I added a merge on each the three Fact tables to the dimensions that had issues. The really good part of this was it totally made a mess of one fact table. It took me a long tiome to realsie that the data warehouse was adding blank spaces at the end of one of my keys.
I still have issues between one fact table and one dimension table but as the users are still arguing over what they want I wont fix it till they decide what they want.
Many thanks for all the help. It took a long time to do and apoply the changes but it is now in a happy place.
Many Thanks
E
Hi @ells, did the response provided by @ankit help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark as solution the answer that solved your query. Thanks!
@AntrikshSharma
I have marked your response as the solution. While it took almost a day to go through the model and correct the RI issues there has been no downside in refresh time or performance.
The only downsides are when making a change to the query the time to apply is longer and also if I change the query for one dimension and apply the change it refreshes the three fact tables it is linked to
Many thanks for the help on the journey to resolve this.
E