I am considering data flows vs. shared data sets in a project. I wonder if I am missing some obvious advantages with using data flow would like your opinion.
I have an existing DW with fact and dimension tables already in place. Some minor tranformations are still needed in Power Query (PQ) to get the data in the right shape. I will problably need to create several models for analysis and some dimensions will be shared between several models.
My initial idea was to create a number of shared datasets, maintained in separate PBIX files. These would only contain the data and the data model, no reports.
Reports would be created connecting to the shared datasets, and published to other workspaces. This would give a good separation between ETL and modelling and the visualizations.
Now consider using data flows instead/as well. I really like the separation between data/ETL and visualization but I find some things confusing.
A data flow only contains the data and the ETL in form of Power Query online (PQOL). I will still need a dataset to model that data and to create measures. If I want to separate the model from the visualization I will still need to have the data model in a separate PBIX. Only difference is that instead of reading directly from the data source (DW) and doing the transformation in PBI Desktop, I am connecting to the data flow and the transformations are managed in the data flows instead of in Power BI Desktop. I do not want to have the data model in the same PBIX as the reports as this make maintenance a pain.
The benefit of using data flows as I see it are:
- No need to have direct connection to the data source from Power BI Desktop as this is done through data flows instead.
- Refresh can be individual per data flow and not the entire data set.
- Using data flows allow me to connect to additional data sources in PBI Desktop whereas if I connect to a PBI dataset I am limited to whatever is in that dataset.
- Common data sources like calendar tables can de defined ONCE, no need to copy M-queries between PBIX files.
I will still have the problem with separating visuals from dataset
All models that I create that need the same measures requires these to be duplicated
What am I missing with data flows?