I have a massive data flow with Sales.
When I go to bring the data into the data set I connect to the data flow.
Then I filter the sales by year.
However it looks like it brings everything in and then filters so the query editor is not being efficient. Is there a certain way I need to do this or an advanced setting in the service or power BI desktop.
This is starting to cause a few issues
My suggestion in this case could be try building another dataflow from the existing dataflow and filter for required data that way it will be query will be folded at 2nd dataflow and then can create dataset from newly created flow.
@MK3010
Thanks but we have experienced issues with Linked entities within the power BI service. We traced some issues with data back annd found some discrepancies with the data. Additionally the development chain is long enough at the moment with dataflow -> data set -> report and as we have no source control or deployment strategy I am trying to keep code items to a minimum.
@Melissa
Will read through the advance compute engine link. I notice Mathew had a lot of dataflow articles recently but must have missed this.
Thanks
E
Hi @ells, did the response provided by @Melissa and @MK3010 help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!
I am trying to build an enterprise solution so there will be data flows for sales. I will have a dataflow that contains an object for each sales year -
DFSales_0 (current year) (Sourced from SQL)
DFSales_1 (previous year) (Sourced from SQL)
…
DFSales_4 (Sourced from SQL)
One data set will be taking in two years data (current and previous)
One data set will be taking in the lot
One data set will be taking in three years
At the moment each data set will bring in just what is required. This works fine.
However some of the dataflows we take data from are global data flows and we need to extract just a region.
If we create our own dataflow that links to the global data flow - our data sets will not have to import as much data then filter it. The downside is we have an extra data flow that then needs to be scheduled to refresh. Is this the only way and is this really a good design?
I am not toally convinced there is a right way forward here and it it is either my understanding or there is something missing here.
@ells, my experience is that if you can limit what you are bringing into your dataset, it’s going to be easier to manage.
I have created multiple separate dataflows, each with a minimum of 5 tables. Yes, this means I had to juggle scheduling the refresh on all of them against my datasets, however I only had to figure this out once so I felt it was worth the trouble.
This has allowed me to have a couple of dataflows that don’t refresh regularly, because the tables in those flows do not need a daily refresh. It’s also kept my dataflows small, which has lowered the risk of refresh failure (something that was happening when I first tried to run everything in a single dataflow)
Now, I also have to say that I am only refreshing my data once overnight, so I can space out the dataflow refreshes easily with plenty of time between the various dataflows.
@Heather
ok I will proceed with this. Its a bit different for me as I work in a bit of a fragile environment. Everything is changing quickly so its difficult to plan in advance.
Thanks
E
Hi @ells, did the response provided by @Heather help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!
@MK3010
Thanks. I am just sitting watching my data set apply changes when I use Query editor and I can wait an hour plus waiting before it applies all the changes. Anything to reduce data set size and complexity is good as the users are all ready trying to thro more stuff in by the bucket load.
Thanks
E
I can only see multiple dataflow to help you in reducing your data size unless you are not reducing at source(which is out of scope per your scenario). Few question like–
Have you set incremental refresh where you don’t need to refresh entire data?
have you checked your query performance. Best way is to use PBI Desktop and try to optimize it. If possible share the M code with a sample data. You can read below post which I wrote yesterday that might help you …
Best option is to split the dataflow with different source separately which will improve the refresh time for each dataflow.
Also depends on the capacity you have been allowed in your organization so some solution will work for you and some will not, the only way is to experiment it in dev and test environment.
Hi @ells, did the response provided by @MK3010 help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!
Hi @ells, we’ve noticed that no response has been received from you since the 18th of December. We just want to check if you still need further help with this post? In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the checkbox. Thanks!
Hi @Ells, a response on this post has been tagged as “Solution”. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box. Also, we’ve recently launched the Enterprise DNA Forum User Experience Survey, please feel free to answer it and give your insights on how we can further improve the Support forum. Thanks!