Large Data Set Memory Issues

Hello All,

I am new to forums, been on the videos for a few weeks but am very much, big time stuck on a multi-tab dashboard I had built and was nearly done with but then all of the sudden I got a memory error (three weeks ago) and since I have been entirely unable to correct nor reproduce the file.

Data background: I have been forced to upload a large number of excel files from query editor because our IT department is in love with Sisense (even though we are no where remotely close to building a defined model for its ML function to be viable) and do not want to tie our databases to PBI. Thankfully the person who’s been pushing Sisi is leaving in a month, so I have an opportunity to present PBI.

My initial PBI file was 816MB, fed by several gigs of Excel files where the raw, line item data for three years consisted of nearly 14M records. There were a couple extra million records on ancillary order data tables. The dashboard was nearly done, with eight tabs full of graphics and tables then it just error’d out and crashed my computer. When I attempted to open it again, it crashed my computer. When I open it on my work notebook, I get memory errors. If I try to condense the file to just ONE tab I still get the exact same responses on both machines.
This file was a test file, so I knew I had a lot of repeated columns and calculated columns that I could remove to save memory.

So I rebuilt the file, cut the uploading data in half with only a fraction of the calculated columns and … I still get the exact same response. I took my blank, rebuilt data only file, tried to add ONE line chart and my computer crashed. On my notebook, I get memory errors.

I have also uninstalled and reinstalled PBI. I have been attempting to fix this for almost a month, it’s a darn good thing I’m not a drinker!

I.Do.Not.Understand.What.Happened. It.Worked.Fine.Before.
Help me ObiWan DNA, you’re my only hope.

-Kondukt

Oh, and my home PC is beast’ish which confuses my as to why it crashes.

Hi Kondukt,

It could be your computer, not enough memory to be able to handle all the information. How much ram do you have your computer?
Is your computer x64 bit machine or x32 bit computer? What excel version are you using? I know if you aren’t using a x64 bit computer, you will have trouble handling large data sets.

Is it possible to have your IT area to only provide the information you need only?

When you pull the information information into power query editor, are you removing columns you don’t need? I would recommend that you remove calculated columns/repeat columns from the data.

Did you test the file on another computer at work?

Sometime when a file become corrupt, the only solution is start again.

This might be a blessing that you can cut out data you don’t need. Don’t have calculation in the raw data.

I don’t know what else to suggest.

Good Luck
Keith

1 Like

Try using the Task Manager to close all application that is running and try running Power BI to see if it will work

@Kondukt

That’s still a pretty big PBIX file figuring the compression should make the files pretty small. This leads me to believe that this is initially a data modelling issue. Have you ever used DAX Studio?

After you install the above and connect to your pbix file, go the Advanced tab and then View Metrics:
image

It will produce a new tab called "VertiPaq Analyzer Metrics with a table like the following:

This provides a lot of vital information about your data model. Could you do that and put up a screenshot of the output?

Also, a screenshot of your data model diagram would be good as well. This will be the first step in troublshooting, if we can get the data model to be more “Dax friendly” (i.e. long narrow tables with as little unique values) the better. Once we are set on the data model we can then look into the DAX of the actual measures. If this wont refresh on your desktop, or is very slow, there is more than a good chance that this would not refresh if you use PBI Service with a pro license and not premium since there is a 2 gig limit of what the service can process at a time.

1 Like

May not be specific to your issue but, using the inbuilt Auto Date/Time for Time Intelligence can enlarge the file.

If any of your measures etc are using Auto Date/Time you will need to amend to use a custom built Dates table.

I have 24MB of RAM on the home PC and 32 on the work notebook, both machines are 64 bit with matching applications. My boss and I are about to ask for query access into PBI from Snowflake and their other datawarehouse that I currently access through SSMS.

And yeah, the rebuild was a blessing as I did a massive clean up. Appreciate the look good sir.

1 Like

I do have DAX Studio \o/, but I’ve only used it to extract a large data table I created in PBI (whole nother story) so I am unfamiliar with the vast majority of it. :face_with_hand_over_mouth: Although, after having connected my rebuilt file and navigating to the Advanced tab, my only options are :

  • Export Data
  • SQL Profiler
  • Excel

I did notice there’s a number of “LocalDateTable_xxxxxx” in the DaxStudio navigation frame.

In the meantime, here is an image of the data model.

I also did read about the 10gig limit in PRO services and thought, hm, maybe that’s it but didn’t want to pester IT since they’re Sisi(nse) fans.

I do have an auto date time table that I thought was a benchmark practice in PBI, and I do have a fair amount of time intelligence measures for this report, along with 12! tables "localDateTables showing in DAXStudio, but I only see the one in PBI.

localDateTables are the hidden tables PowerBI uses behind the scenes for Auto Date/Time Intelligence.
Also, if your going down the custom time intelligence, you’ll need to mark your calendar/dates table as such.

1 Like

Make sure you have the latest version of Dax Studio. It doesnt update automatically. The latest version is 2.11.0

@marcster_uk is correct. You will want to use a dedicated date table and now the builtin auto-date feature in PBI. There are many reasons to do this, but the main one is performance. Be sure to disable that by going to File–>Options and Settings–>Options–>Global–>Data Load–>uncheck “Auto date/time for new files”

The 2 gig limit I was referring to is the limit your queries (i.e. dax) can use at one time. When you have a very complex measure or incorrect data model ( or both) it can result in a lot of data being querried by Dax. And since your desktop errors out I will assume it will error out on the service as well. The 10 gig is the total data you can save to the service.

Taking a look at your data model:

  1. Be careful of using bi-directional filters. They can produce unexpected results and there will be a performance penalty
  2. There are a lot of columns in some of the tables. Especially the “key_Orders_Ancill…” Since it’s on the 1 side of a 1:M to “Tbl_Orders_Line…” i will assume that the Key_Orders table is a dimension table and used to filter the many side. The main issue here is that the more columns you have in your dimension table the more cardinality (unique values) you have. The higher the cardinality the slower the dax and the larger the file size is since Vertipaq cannot compress as efficiently as you would like.

Steps I would take:

  • Try to get the “View Metrics” to work from DAX Studio
  • See if that tells me where I should be looking
  • Try to remove the bi-directional filters unless I really need them ( chances are you do not )
  • Try to remove columns I do not need. Or try to combine many columns into 1 column
2 Likes

I am noticing that the dates table is on top most of the time, which I wouldn’t expect given that the LineItem table is substantially larger. So, that pointed me to a calculated column I created and have a couple dozen measures reliant on. That calculated column was an attempt to correct for bad date data (a fraction of the issues in our data which is why we’re no where near a ML environment coughsisensecough) which is a bit of a long one, and probably not too optimal:

D.BilledADJ = 
var DD = DATEDIFF( tbl_Orders_DatesDays[BILLFROM_DATE] , tbl_Orders_DatesDays[BILLTO_DATE], DAY )
var YrTo = YEAR( tbl_Orders_DatesDays[BILLTO_DATE] ) 
var YrFrom = YEAR( tbl_Orders_DatesDays[BILLFROM_DATE] )
var YrToday = YEAR( TODAY() ) + 2
var YrRet = YEAR( tbl_Orders_DatesDays[RETURN_DATE] )
var YrCreated = YEAR( tbl_Orders_DatesDays[CREATED_DATE] ) 
var YrShp = YEAR( tbl_Orders_DatesDays[SHIP_DATE] )
var BeginDT =
    DATE( 
        SWITCH( TRUE() , 
            YrFrom < 1900 , YrShp ,
            YrFrom > YrToday , YrToday , 
            ISBLANK( YrFrom ) = TRUE() , YrCreated ,
            YrFrom ) ,
        SWITCH( TRUE() , 
            ISBLANK( tbl_Orders_DatesDays[BILLFROM_DATE] ) = TRUE() , MONTH( tbl_Orders_DatesDays[SHIP_DATE] ) ,
            tbl_Orders_DatesDays[BILLTO_DATE] < tbl_Orders_DatesDays[BILLFROM_DATE] , MONTH( tbl_Orders_DatesDays[BILLFROM_DATE] ) ,
            MONTH( tbl_Orders_DatesDays[BILLFROM_DATE] ) ) , 
        SWITCH( TRUE() ,
            ISBLANK( tbl_Orders_DatesDays[BILLFROM_DATE] ) = TRUE() , DAY( tbl_Orders_DatesDays[SHIP_DATE] ) ,
            tbl_Orders_DatesDays[BILLTO_DATE] < tbl_Orders_DatesDays[BILLFROM_DATE] , DAY( tbl_Orders_DatesDays[BILLFROM_DATE] ) ,
            DAY( tbl_Orders_DatesDays[BILLFROM_DATE] ) )
        )
var EndDT = 
    DATE( 
        SWITCH( TRUE() , 
            YrTo > YrToday , YrToday , 
            ISBLANK( YrTo ) = TRUE() , YrCreated ,
            YrTo ) ,
        SWITCH( TRUE() , 
            ISBLANK( tbl_Orders_DatesDays[BILLTO_DATE] ) = TRUE() , MONTH( tbl_Orders_DatesDays[RETURN_DATE] ) ,
            tbl_Orders_DatesDays[BILLTO_DATE] < tbl_Orders_DatesDays[BILLFROM_DATE] , MONTH( tbl_Orders_DatesDays[BILLFROM_DATE] ) ,
            MONTH( tbl_Orders_DatesDays[BILLTO_DATE] ) ) , 
        SWITCH( TRUE() ,
            ISBLANK( tbl_Orders_DatesDays[BILLTO_DATE] ) = TRUE() , DAY( tbl_Orders_DatesDays[RETURN_DATE] ) ,
            tbl_Orders_DatesDays[BILLTO_DATE] < tbl_Orders_DatesDays[BILLFROM_DATE] , DAY( tbl_Orders_DatesDays[BILLFROM_DATE] ) ,
            DAY( tbl_Orders_DatesDays[BILLTO_DATE] ) )
        )
var DDadj = DATEDIFF( BeginDT , EndDT , DAY )

RETURN
    SWITCH( TRUE(),
        OR( DD <= 0, tbl_Orders_DatesDays[D.B] <= 0  || DDadj <= 0 ) , 1 , 
        DDadj )

Might this be the problem?

Let’s see what happens when you turn off the auto/date. Those bi-directional filters you have set, do they need to be that way or would a 1:M work?

Also would want to find a way to make those “key_xxx” tables less wide (less columns) and more narrow (more rows). VertiPaq, and subsequently DAX, work best on narrow long tables vs. shorter wider tables.

Your calculated column shouldn’t be affecting the PBIX file erroring out due to memory since the calculated column is calculated only at refresh time and not during the running of the DAX query.

I think if we fix the data model it will help, but I think it will still take some rewriting the DAX in a more proficient manner as well, but need to set the data model before anything.

2 Likes

Copy all, will work on effecting all those and get back to you @Nick_M.

Also, thank you very much for the ancillary help, I am self-taught and have obviously bounced around to whatever works. Am taking a paced 160hr course for BI Architect that does go in depth on this structural side you’re pointing me toward.

To be continued . . .

Quick responses to some action items you mentioned:

  • The “key_Orders_AncillInfo” table is used as a lookup for the ancillary order info but also a unique record for each order I that presumed could be used as my Order(ID#) key.
  • When I unchecked the auto date time box and attempted to open the original PBI file, it still produces memory errors on the visuals.
  • All of the bi-directional (Cross-filtering set to both) are ‘One-to-One’ relationships that aren’t allowing a change to single.

You are welcome. Sometime is just a matter of playing with things to figure out the best process and training yourself.
Good Luck

Well, I spent some time and figured out how to setup a local DB, add files to it, then direct query into PBI. 12kb file.