Performance comparison between DirectQuery & DataImport

Hi there,
Currently I am helping my client to convert a DataImport PBIX to use DirectQuery instead. This is to avoid data storing in cloud.

We have timing on Data Import report at this point. – Which is performing faster - expected.
Now we are having a hard time to gauge performance on DirectQuery in general.
(this is a general comparison. Do know that performance is a lengthy and tricky topic. Let’s assume that we are using the same measure/visual for timing comparison. Queries/DAX are tuned to its optimal on both DataImport & DirectQuery)

Can I say that DirectQuery is usually 2-3 times faster than DataImport ? I am looking for some MSFT posting on this. Did come across this whitepaper published back in 2016. → page 56.

General conclusion made in the whitepaper still correct?

Anyone has done any research that can shed some lights?

Thanks!

Hello @akiko_lim,

Thank you for posting the query onto the Forum.

I guess from what I’ve seen and observed during a course of time is that DataImport is faster as compared to the DirectQuery.

I’ve also provided below some of the links of an articles and videos which may help you to choose between these options.

Hoping you find this useful and can help you to achieve the best analysis.

Thanks & Warm Regards,
Harsh

[Power BI Get Data: Import vs. DirectQuery vs. Live] -
(https://www.youtube.com/watch?v=y4S-k34qmnw)

[Article on Power BI Connections: Import] -
(https://powerbi.tips/2017/11/power-bi-connections-import/)

[Article on Power BI Connections: DirectQuery] -
(https://powerbi.tips/2017/12/power-bi-connections-direct-query/)

Hi @akiko_lim,

In DataImport perspective everything in local is still going to be in memory and its will be in PowerBI Services or in the Microsoft data center, so it is going to be really really fast in performance,

Although, this whitepaper on DirectQuery was redesigned for dramatically to improved speed and performance, however, it is also now more complex to understand and implement considering some instances on Power BI data sources
Reason why DirectQuery is often used is because, if you are not using PowerBI Premium you have 1GB Dataset limit (shared capacity) but if you are using premium it could go up to 10GB as your limit. So if your dataset is going to be larger than 10GB what do you do, you can look at some of the other options or take advantage of some of the premium features like aggregation tables and do a mix of like import and direct query where your aggregation tables are the ones that are actually in the data model, so that limits the amount of data you are going to be use.

Let me Highlight some Difference that makes DataImport relatively Performed Faster than DirectQuery

DataImport/Cached dataset;

  1. Copy of data from data source and into PBIX
  2. All data sources available
  3. Full PowerBI experience:
  • Full Power Query
  • Full Dax
  • Can combine Sources
  1. Publishe means it gets copied to the service data and reside in the Microsoft Cloud
  2. Limited to 1GB dataset compressed on disk from Non-Premium up to 10GB for Premium

Why do you need this?

  • Self Service is available
  • Take advantages of full functionality
  • Data model can fit in PowerBI
  • Finally, Tends to be the fastest option.

DirectQuery;

  1. Limited data sources
  2. Schema resides in PBIX
  3. Data stays at data sources
  4. Interacting with reports or refreshing will send queries to data sources
  5. May require gateway
  6. Limited Power Query and DAX functionality
  7. Performance problems tend to creep up

Why do you need this?

  • Data is updated frequently
  • Data can’t fit in an AS Tabular model ( PowerBI or Analysis)

So, considering all those limitation and constrain of DirectQuery makes the DataImport always the first priority.
Also, try and get what could be the best need to aids your client services and choose the best for them from your findings.

Moving forward, if you need some clarification, I will be please to respond and dont hesitate to ask.
If you find this post very useful kindly make it the solution for others.

DAVID

2 Likes

Thank you both for sharing info. Very much appreciated.
We have mixed requirements here of both worlds (DataImport & DirectQuery) .

a) a lot of Dax computation in PBIX , data -in-memory = DataImport makes it a superior choice.
b) user experience is a major requirement - report needs to be fast in rendering between filtering & clicking.
c) Source data update frequently - 15 mins
d) Security restriction where data needs to reside on-prem.

Currently the security restriction is driving us to go with DirectQuery direction and we are struggling to make it performs. Do understand DirectQuery is the slowest among the 3 (Data Import, Live, DirectQeury)…

But how slow in general ?

Based on whitepaper, looks like DirectQuery with good performance is tricky to implement. Power BI Developer needs to be database savvy as well in this case, or you need a DBA for this.
Conclusion, to achieve performance, you pretty much need to ‘materialize’ calculated data in SQL or database. If this is the case, loosing the essence of business intelligent ?

Hello @akiko_lim,

I would like to highlight some of the facts as far as speed and performance is concerned -

  1. Upon Importing the Data, the Power BI ensures the queries are loaded quickly and all changes to the visual are reflected immediately whereas while using the DirectQuery option, queries are sent to the underlying data source to retrieve the necessary data. The time taken to refresh the visual will depend upon the performance of the underlying data source.

  2. DirectQuery can prove to be a very useful/viable option only if your data is changing frequently and a near real - time reporting is required and can also be helpful in applying data sovereignty restrictions.

  3. If refreshing each visual takes just a few seconds then the user experience would be reasonable but if the slowness of the source causes individual visuals to take longer than tens of seconds, the experience becomes extremely poor and queries may even time out.

  4. When you use multiple data sources, it’s important to understand how data is moved back and forth between the underlying data sources, and the security implications it brings.

  5. Similarly, there are limitations in the data transformations that can be applied within Query Editor. With imported data, a sophisticated set of transformations can easily be applied to clean and reshape the data before using it to create visuals whereas those transformations are more limited in the DirectQuery.

  6. When in DirectQuery mode, it’s not possible to use the family of DAX PATH() functions that generally handle Parent-Child structures and also it’s not possible to use the Clustering capability, to automatically find groups.

  7. A time-out of four minutes is applied to individual queries in the Power BI service. Queries taking longer than that shall fail.

  8. If the data in the underlying source is changing, there’s no guarantee that each visual shows the data at the exact same point of time.To guarantee this consistency would require the overhead of refreshing all visuals whenever any visual refreshed, in tandem with the use of costly features like Snapshot Isolation in the underlying data source.

  9. While it’s possible to switch a model from DirectQuery mode to use import mode, all the necessary data must be imported. It’s also not possible to switch back, primarily because of the set of features not supported in DirectQuery mode.

  10. And lastly it’s not good or recommended to ‘materialize’ calculated data in SQL or database since will have an adverse impact on the overall performance of the report and that would not be the great user experience. Therefore, @sam.mckay has always recommended to rely most of the times on measures in order to achieve diverse analysis.

Here’s also the link provided for the reference exclusively relating to the DirectQuery in Power BI.

Hoping this be helpful to you in achieving the desired analysis.

Please feel free to write back in case I’ve missed out on anything.

Thanks & Warm Regards,
Harsh

[About using DirectQuery in Power BI] -
(https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-directquery-about)

2 Likes

Thanks Harsh for your insights & the link is helpful.

one more question. Is CALCULATETABLE() supported in DirectQuery ? - I use this my Measure - DAX formula.

Based on this blog post :

I am confused over the link you shared - which touches on “Calculated table” = CALCULATETABLE() function and above blog post I read.

What’s your take ?

Thanks again Harsh. you have been very helpful and kind.

Hello @akiko_lim,

When we use Power BI to connect to a data source such as SQL Server in DirectQuery mode, we’ll find that by default we cannot use the complete range of DAX functions inside our calculations.

Whenever we get an error message like “Function 'XXXXX" is not supported in DirectQuery mode” we know that we’re trying to use a DAX function that isn’t allowed by default in DirectQuery mode.

Why does this limitation exist?

  1. If the Query Editor query is overly complex, an error occurs. To remedy the error, either delete the problematic step in Query Editor , or import the data instead of using DirectQuery.

  2. Time intelligence capabilities are unavailable in DirectQuery. For example, special treatment of date columns (such as year, quarter, month, or day) isn’t supported in DirectQuery mode.

  3. Limitations are placed on DAX expressions allowed in measures to ensure that queries sent to the underlying data source have acceptable performance.

All this limitations exists just to improve the performance reasons. In Power BI, every time we interact with a report (by opening it, clicking on a slicer etc) a series of DAX queries is sent back to the Data Model to get the data needed by the report. In DirectQuery mode, all of these DAX queries – including any calculations – are translated into queries that are sent back to the data source. If that data source is SQL Server, then Power BI generates SQL queries. Not everything that we can do in a DAX calculation can be translated back into efficient SQL, so to prevent us from accidentally building calculations that will perform badly in DirectQuery mode Power BI tries to prevent us from using any DAX functions that it thinks it can’t generate efficient SQL for.

DAX functions in DirectQuery mode

All DAX functions are supported for DirectQuery models. However, not all functions are supported for all formula types, and not all functions have been optimized for DirectQuery models. At the most basic level, DAX functions can be put into two camps: Optimized and Non-optimized. Let’s first take a closer look at optimized functions.

Optimized for DirectQuery

These are functions that primarily return scalar or aggregate results. These functions are further divided into those that are supported in all types of formulas: measures, queries, calculated columns, row level security, and those that are supported in measure and query formulas only. These include:

  • Supported in all DAX formulas -

ABS
ACOS
ACOT
AND
ASIN
ATAN
BLANK
CEILING
CONCATENATE
CONTAINSSTRING
COS
COT
CURRENCY
DATE
DATEDIFF
DATEVALUE
DAY
DEGREES
DIVIDE
EDATE
EOMONTH
EXACT
EXP
FALSE
FIND
HOUR
IF
INT
ISBLANK
ISO.CEILING
KEEPFILTERS
LEFT
LEN
LN
LOG
LOG10
LOWER
MAX
MID
MIN
MINUTE
MOD
MONTH
MROUND
NOT
NOW
OR
PI
POWER
QUOTIENT
RADIANS
RAND
RELATED
REPT
RIGHT
ROUND
ROUNDDOWN
ROUNDUP
SEARCH
SECOND
SIGN
SIN
SQRT
SQRTPI
SUBSTITUTE
SWITCH
TAN
TIME
TIMEVALUE
TODAY
TRIM
TRUE
TRUNC
UNICODE
UPPER
USERNAME
USERELATIONSHIP
VALUE
WEEKDAY
WEEKNUM
YEAR

  • Supported in measure and query formulas only

ALL
ALLEXCEPT
ALLNOBLANKROW
ALLSELECTED
AVERAGE
AVERAGEA
AVERAGEX
CALCULATE
CALCULATETABLE
COUNT
COUNTA
COUNTAX
COUNTROWS
COUNTX
DISTINCT
DISTINCTCOUNT
FILTER
FILTERS
HASONEFILTER
HASONEVALUE
ISCROSSFILTERED
ISFILTERED
MAXA
MAXX
MIN
MINA
MINX
RELATEDTABLE
STDEV.P
STDEV.S
STDEVX.P
STDEVX.S
SUM
SUMX
VALUES
VAR.P
VAR.S
VARX.P
VARX.S

This list keeps on changing from time-to-time with each new release.

Some functions have not been optimized to work with DirectQuery. These functions are not supported in calculated column and row-level security formulas at all. However, these functions are supported in measure and query formulas, albeit with uncertain performance.

The reasons a particular function might not be optimized for DirectQuery is because the underlying relational engine cannot perform calculations equivalent to those performed by the VertiPaq engine, or the formula cannot be converted to an equivalent SQL expression. In other cases, the performance of the converted expression and the resulting calculations may be unacceptable.

And lastly if you visit the Page 10, 20, 21 and 22 from the PDF which you’ve provided in that you’ll find an article related to the Calculated Tables and Columns stating as follows -

Calculated tables — DirectQuery does not support calculated tables, mainly because there is no place to store them.

Hoping you find this useful and can help you to achieve the desired analysis.

Please feel free to write back in case I’ve missed out on anything.

Thanks & Warm Regards,
Harsh

2 Likes

Hello @akiko_lim,

Checking the Power Query Timings can be done using Monkeytools add-in from www.excelguru.ca.

There are some detailed explanation in the documentation here
This is a small sample

There is also a thread on the forum for it here : Thread: Monkey Tools

Hope this helps.

1 Like

Thanks Harsh. Got it. :slight_smile:

Thanks @cristian.angyal, definitely will check it out! nice sharing.

You’re Welcome @akiko_lim,

Glad to help you out and all the best for the project.

Thanks & Warm Regards,
Harsh