There is something I have learned about dataflows that I would like to share with you and others from eDNA.
I ran into a problem with extracting and transforming data. It concerned several tables where I performed various transformations such as merges etc.
Every time it went wrong with applying and closing the query steps. It literally took hours (4+) and I ended up getting vague error messages and it failed. Several tables were identified as the cause of the crash. It kept changing.
Then a new colleague showed me how you can use dataflows to solve these kinds of problems (in some cases).
You can use different dataflows that have their own task. In one you only make the connection with the data source. So the core purpose is to extract data from the source. In the other dataflow you perform transformations. Let’s say you have to merge two large tables for example Orders and Cases.
First you create a dataflow with which you only connect to the Orders table. All complex columns must be expanded in dataflows if you need that data, otherwise they will be removed automatically by the online Power Query Editor. But other than that no transformations. Then you close and refresh the data flow.
Then you create a dataflow with which you only made the connection with the Cases table like with the Orders table. Then you close and refresh the data flow.
Then you create a third dataflow in which you make the connection with the first two dataflows and then perform the merge.
All 3 dataflows were done with the refresh in a few minutes. After the refresh in Power BI Service they become available in Power BI Desktop and the importing is a matter of minutes. Instead of performing all steps at once, you can perform steps in different dataflows.
In this situation dataflows were a BIG help!
By the way… Several data flows can easily live in a workspace, so it is a good idea to agree on certain naming conventions. We like to use the following naming conventions:
Hope it helps you the way it helped me