Should multiple uses of a table connect only once via ODBC

I routinely see queries indicating millions of rows of data are being read when refreshing even if the final result for the data model may be less than 1,000 records. I’m wondering if I’m causing a heavier load when actually trying to lighten it.

As an example, say I need to reference data from table A for many queries in the same report. Is it better to make the ODBC connection to table A only once, select all of the columns which may be needed by various queries, filter where possible to include only the minimum number of rows needed, and then use that as the source for other queries rather than repeating the same process for each query individually?

In an attempt to explain it better, which is better below? Or is there a better answer I don’t mention?

Table A is used by queries numbered, 1,2,3,4,5.
Should I navigate via ODBC to Table A only once and then reference that connection in the 5 queries, or should each of the 5 queries navigate directly to Table A in my Oracle database.

We do not have a data warehouse but make connections to multiple Oracle tables via ODBC. This report references more than 30 separate Oracle tables, and each table may be used by anywhere from 1 to 10 queries which may in turn be merged in various combinations to create the data model used for the report. I wind up with a total of 24 fact or dimension tables used by one report when trying to reduce it to the minimum needed.

I’m basically trying to follow the mindset of referencing data tables as needed rather than duplicating them.

1 Like

Yes, referencing is your friend. Ideally, if possible, you should only pull the raw data in once from the data source. Once inside PQ, then this query can be set to “disable load” and used as a reference as many times as necessary.
Greg

Hi @pduffer,

Agree with @Greg referencing is your friend. :wink:
Maybe this article can help you optimize the merges.

Thanks @Greg , @Melissa!

I do eliminate all unnecessary columns, set to disable load, and then reference when needed. So at least I’m getting that part right!

I will look into sorting my queries before merging so I can utilize the SortMerge rather than NestedJoin. Maybe that can help with the performance.

So watch the row counter go up to 10 million rows when refreshing a query that contains only 3,000 records isn’t necessarily an indicator of a problem?

Ray

Hi @pduffer,

At this time I believe there are 5 articles in the series: Optimising The Performance Of Power Query Merges, so you might want to examine the others as well…

Further more you might also want to look into Query Diagnostics to explore and focus other optimization efforts.
https://docs.microsoft.com/en-us/power-query/querydiagnostics

I hope this is helpful

1 Like

Thanks. I’ve previewed and bookmarked the pages and plan to review them and associated pages. I appreciate the links to helpful information.

Hi @pduffer, please don’t forget if your question has been answered within the forum it is important to mark your thread as ‘solved’.