Improve query performance with Table.Buffer?

I have a horrible data cleaning problem that I have finally resolved, but the refresh performance is horrible, and I think that part of the bottleneck might be able to be resolved with the use of both Table.Buffer and List.Buffer, but I need some hand-holding.

Some background: at our hospital for reasons you don’t care about, some staff contact is recorded in an Excel workbook where we record a week’s activity per sheet, with 3 columns per staff member, per day (the “real sheet” has 208 week - 4 years’ information):

IPLoadMASKED.pbix (332.6 KB)
IP_Load_2023_Dummy.xlsx (638.0 KB)

Because I have to pull out each of the individual columns, from each individual sheet, and then ascribe the values to the individual staff resource for that day, I have written 2 queries for each therapist for each day, (14 queries for the whole week) and then finally appended all these together to the main query (“fInPateints” in the .pbix). Each of these queries is also looping through each of the 208 sheets in the source Excel, and since it’s doing this from a OneDrive file, we have the extra network bottleneck.
Finally my question: I think that I should be able to improve performance by buffering the source table rather than fetching it each time, and similarly buffering the small lists that I use to parse the “full” query which is grabbing the first 43 columns from every sheet in the workbook that’s not hidden.
Obviously I’m going to need to explain more, unless someone is truly fluent in reading my poorly written M code, so I might leave it at that.
Final point: each of the queries in the attached pbix are pointing to the OneDrive file (which I can’t share), and the attached masked Excel file I have provided is of the same format, but I’ve cut it down to 4 sheets for the sake of simplicity.
As proof of concept though, if someone were able to implement Table.Buffer and List.Buffer in one of the queries, I think I could sort the remaining 13 out myself.
Thanks in advance for any help,
Rod

Hi @rodwhiteley,

See if this improves performance for you. Here’s what I’ve done.

Changes made to all list queries, to illustrate for_Therapists:

  • in yellow; key match lookup (drill down into the table)
  • in green; field selection (drill down into the column)
    wrapped List.Buffer around it.

Changes made to the Base query, see _Base:

  • Involved reordering, combining and reducing repetitive steps, here are the highlights.
  • Followed your SelRows with a SelColumns step
  • After the AddedIndex, created a new step with custom logic to AddFields

Instead of the repeated:

Table.AddColumn(
  AddedIndex, 
  "D1T1", 
  each if (List.Contains(Therapists, [Data.Column2]) = true) then [Data.Column2] else null
)

I’m creating a Record in which I have created all these new fields. to illustrate the logic for the like named field is as follows:
D1T1 = List.Select(_Therapists, (x)=> x = [Data.Column2] ){0}?

if-then-else is replaced by optional item access to retrieve the first match from _Therapists

Added a dynamic expand of the temp record:

Table.ExpandRecordColumn( 
  AddedFields, 
  "temp", 
  Record.FieldNames( AddedFields[temp]{0} )
)

Added a dynamic fill down for all therapist fields:

Table.FillDown( ExpandTemp, 
  List.Select(
    Record.FieldNames( AddedFields[temp]{0}) , 
    each Text.Start(Text.End(_, 2), 1) = "T"
  )
)

And a dynamic removal of old columns

Table.RemoveColumns(
  FillDownTherapists, 
  List.Skip(
    List.Select(
      Table.ColumnNames(FillDownTherapists), 
      each Text.StartsWith(_, "Data.")
    ), 1
  )
)

Here are some resources for you:

.
And here’s your sample file.
IPLoadMASKED.pbix (345.1 KB)

I hope this is helpful

2 Likes

More than helpful, it’s simply wizardry!
I am going to need to spend a lot more time to truly understand this, but you have improved performance of this query approximately 100X.
These are certainly techniques I need to get more acquainted/comfortable with, so thanks for also pointing out the resources.
Thanks again,
Rod

1 Like