Optimizing Power Query Calculations for Faster Refresh

Hi all,

I have the following power query created in a dataflow. When I attempt to refresh, it takes almost 30 minutes. I was wondering how can I optimize the query. I really don’t want to do the calculations in desktop but want to keep it in dataflow. The query is attached as well as I have outlined the steps below. I believe there is a better way I can do this through Grouping.

I did create a sanitized file that I have attached in excel if anyone wants to help with this question.

In a nutshell, here is the gut of the calculation in sequential steps. You can see this in the attached query.

  1. Merge with another table to get the RFP Date
  2. get the difference between RFP Date and Horiginal Date
    3… get the difference between RFP Date and H01 Date
  3. determine if the difference between RFP date and Horiginal date is >=0 and <=730 with a “Y” or “N”. This column is called horiginal influence time.
  4. determine if the difference between RFP date and H01 date is >=0 and <=730 with a “Y” or “N”. This column is called h01original influence time.
  5. If[horiginal influence time] =“N” and [h01 influence time] = “N” then “Not Influenced” else if[horiginal influence time] = “Y” and [h01 influence time] = “N” then “Horiginal” else if [horiginal influence time] = “N” and [h01 influence time] = “Y” then “H01” else if [horiginal influence time] >[h01 influence time] then “Horiginal” else “H01”). This column is named Influence date logic
  6. filter Influence date logic for <> “Not Influenced”

power query.txt (4.7 KB)

Test File.xlsx (57.4 KB)

I think the real question, can the merge/calculation be done upstream (within the system/datawarehouse) before you retrieve the information using the dataflow.

thanks
Keith

Great question Keith but not as of this moment. We are working on integrating Snowflake to house our own data within our IT infrastructure but is not on the roadmap until probably next year quite honestly. We need a Datawarehouse.

Currently I am retrieving data from an HubSpot API vendor who won’t manipulate the tables or queries because it is standard for all their customers.

So this all I can do for now until we migrate to our own Datawarehouse to retrieve HubSpot data.

Hope I am making sense.

Yes, It makes perfect sense.

I can’t remember if there is something within a courses how to optimized your tables/queries.

I don’t know how many rows/columns you have. Maybe trying doing the power query in excel first to see if that speeds it up. then that file is combined into power bi.

Maybe using SQL?

Its just a thought.

good luck there
I hope it works outs

thanks

Keith

Thanks Keith.

The data comes from a SQL table not excel. I just added a sanitized version of a snippet of the data to see if maybe i should use grouping instead. But I will need help with that logic.

1 Like

@Keith good points thanks for providing support :+1:

Hi @ysherriff,

I found some redundancy in your code and was able to combine some steps, don’t expect major improvements but see how this goes. Please note that I rewrote this query by hand, therefore I didn’t have any visual indicators. Should you get errors read the message carefully and try to resolve it, else report back with the error message and relevant fields and code in view.

  Source = Sql.Database("*******.database.windows.net", "S********9"),
  Navigation = Source{[Schema = "dbo", Item = "Contact"]}[Data],
  SelCols = Table.SelectColumns(Navigation, {"associatedcompanyid", "email", "firstname", "h01_campaign_interaction_date__c", "h01_most_recent_martech_campaign__c", "horiginal_campaign_interaction_date__c", "horiginal_most_recent_martech_campaign__c", "hs_analytics_source_data_1", "hs_lifecyclestage_marketingqualifiedlead_date", "lastname", "position__c", "suppress_contact"}),
  SelRows = Table.SelectRows(SelCols, each ([suppress_contact] = null) and ([hs_analytics_source_data_1] <> "SALESFORCE")),
  ChType = Table.TransformColumnTypes(SelRows, {{"hs_lifecyclestage_marketingqualifiedlead_date", type date}, {"h01_campaign_interaction_date__c", type date}, {"horiginal_campaign_interaction_date__c", type date}}),
  ReplValues = Table.ReplaceValue(ChType, null, #date(1974, 3, 9), Replacer.ReplaceValue, {"h01_campaign_interaction_date__c", "horiginal_campaign_interaction_date__c"}),
  RetrieveRFPDate = Table.NestedJoin(ReplValues, {"email"}, #"RFP Date with Email - Transformation", {"Email"}, "RFP Date with Email", JoinKind.LeftOuter),
  ExpandRFPEmail1 = Table.ExpandTableColumn(RetrieveRFPDate, "RFP Date with Email", {"bc_number__c", "global_or_potential", "RFP Received Date"}, {"bc_number__c", "global_or_potential", "RFP Received Date"}),
  FilterRFPNoNull = Table.SelectRows(ExpandRFPEmail1, each ([RFP Received Date] <> null)),
  AddCalculations = Table.AddColumn(FilterRFPNoNull, "new fields", each 
    [
        rfp date minus horiginal = [RFP Received Date] - [horiginal_campaign_interaction_date__c],
        rfp date minus h01 date = [RFP Received Date] - [h01_campaign_interaction_date__c],
        horiginal influence time = if #"rfp date minus horiginal" >=0 and #"rfp date minus horiginal" <= 730 then "Y" else "N",
        h01 influence time = if #"rfp date minus h01 date" >=0 and #"rfp date minus h01 date" <= 730 then "Y" else "N",
        Influence date Logic = if #"horiginal influence time" ="N" and #"h01 influence time" = "N" then "Not Influenced" else if #"horiginal influence time" = "Y" and #"h01 influence time" = "N" then "Horiginal" else if #"horiginal influence time" = "N" and #"h01 influence time" = "Y" then "H01" else if #"horiginal influence time" > #"h01 influence time" then "Horiginal" else "H01"
    ]
  ),
  ExpandCalculatioins = Table.ExpandRecordColumn( AddCalculations, "new fields", {"rfp date minus horiginal", "rfp date minus h01 date", "horiginal influence time", "h01 influence time", "Influence date Logic" }),
  FilterNotInfluenced = Table.SelectRows( ExpandCalculatioins, each ([Influence date Logic] <> "Not Influenced")),
  InfluenceCampaign = Table.AddColumn(FilterNotInfluenced, "Influence Campaign", each if[Influence date Logic] = "Horiginal" then[horiginal_most_recent_martech_campaign__c] else [h01_most_recent_martech_campaign__c]), ),
  InfluenceTime = Table.AddColumn( InfluenceCampaign, "Influence Time", each if [Influence date Logic] = "Horiginal" then [rfp date minus horiginal] else [rfp date minus h01 date] ),
  SetTypes = Table.TransformColumnTypes( InfluenceCampaign, {{"rfp date minus horiginal", Int64.Type}, {"rfp date minus h01 date", Int64.Type}, {"horiginal influence time", type text}, {"h01 influence time", type text}, {"Influence date Logic", type text}, {"Influence Campaign", type text}, {"Influence Time", Int64.Type}}),
  #"retrieve campaign id from campaign name" = Table.SplitColumn(SetTypes, "Influence Campaign", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv), {"Influence Campaign.1", "Influence Campaign.2"}),

I hope this is helpful

Hello @ysherriff

Did the responses above help solve your query?

If not, can you let us know where you’re stuck and what additional assistance you need?

If it did, please mark the answer as the SOLUTION by clicking the three dots beside Reply and then tick the check box beside SOLUTION

Thank you

Thanks Melissa. Let me try this code this weekend and get back to you.

Hi Melissa,

First off, beautiful code. I do get an error after the Add Calculation step. When I try to expand the record, i get the following error. Thanks in advance. I think somewhere in the calculation, we need to change the influence time to Whole Number.

Hi @ysherriff,

Thanks for supplying these details, try this:

    [
        rfp date minus horiginal = Number.From( [RFP Received Date] - [horiginal_campaign_interaction_date__c] ),
        rfp date minus h01 date = Number.From( [RFP Received Date] - [h01_campaign_interaction_date__c] ),
        horiginal influence time = if #"rfp date minus horiginal" >=0 and #"rfp date minus horiginal" <= 730 then "Y" else "N",
        h01 influence time = if #"rfp date minus h01 date" >=0 and #"rfp date minus h01 date" <= 730 then "Y" else "N",
        Influence date Logic = if #"horiginal influence time" ="N" and #"h01 influence time" = "N" then "Not Influenced" else if #"horiginal influence time" = "Y" and #"h01 influence time" = "N" then "Horiginal" else if #"horiginal influence time" = "N" and #"h01 influence time" = "Y" then "H01" else if #"horiginal influence time" > #"h01 influence time" then "Horiginal" else "H01"
    ]
1 Like

Thank you Melissa. Works perfectly.

What’s the impact on performance @ysherriff ?
Would love to see your findings on that…

Melissa,

It is superfast when not using work VPN When I run the query through work VPN, it reduced it from 30 minutes to 15 minutes. When I run the query on my desktop without VPN, it takes less than a minute to run.

I believe the lag time has to do with the utilizing power bi service online through my companies VPN.

Much appreciated. I understand the syntax too.

1 Like

Another update, it reduces some more from 15 minutes to 12 minutes. So it seems utilizing the Record function works very well Melissa.

1 Like