How much is too much data

Hi everyone,

I am back on to ask a fairly fundamental question. I am working with a Dell Latitude 5411Core i7 16GB RAM. Up until now I have been building a lot of individual reports. For example collating data on one subject matter and building out a specific report. These reports have all run really well with 0 performance issues.

I have one report that scraps data form a 500mb excel file and the .pbix is 35MB, this report is slow to refresh but otherwise works very well but the data is only coming from one source for the fact table

In the last few weeks I started to build out a financial markets report, at first it was working perfecting when I loaded in c.30 securities from a Bloomberg download and it was great. I then went and added in 60 individual bonds and started to build cross relationships between bonds and equities via my look up tables.

Literally out of nowhere it has become impossible to refresh
Financial Market Report (1).pbix (5.4 MB)
any data or even make new data source connections. BI just hangs on loading.

This example has been like this for over an hour. So now I appear stuck with a report that has too much for my CPU to handle. However the data source excel in total are only c. 15MB combined so the actual size of the raw data is not that large. I probably only have 50 or so measures as well so its not overly excessive, that said there is quite a lot of data transformation, not overly complicated but it was completed over c. 100 excel tabs. All of a sudden it is blowing up my CPU and memory.

This is obviously quite upsetting now as I spent so long setting up the data in the first place (when I was loading the initial data there was no issue and while slow to append 60 queries it was not excessive) and now my BI report is effectively shutting down my computer.

After that long winded overview I am just trying to find out if there is a way to know when you are overloading BI for a level of computing power, how can one tell when they are putting too much into a report that it will start to test the capabilities of the computer itself? Or alternatively have I just done something fundamentally wrong in my data upload.

1 Like

Hi @Ronan,

Your file is difficult to process so I’ll start with some general recommendations.

  • Disable Auto date/time and
  • Disable Allow date previews to download in the background
  • Test the effect of Disable / Enable parallel loading of tables
    .
    image

.
Restructure your queries, as much as possible

  • Remove unwanted/unneeded Columns and Rows first
  • Avoid redundancy in your code (repeating the same transformation multiple times)

Something else to consider:

  • Instead of loading XLSX files, try loading CSV’s instead (from folder?).
  • Use multiple dataflows instead of doing all this in a single PBIX. That way you only have to connect and bring in a ‘ready for use table’ and all the staging queries are offloaded and processed in PQ online

I hope this is helpful.

3 Likes

Melissa, once again thank you so much!

You are such a legend for helping out on here. I do want to clarify a few points with you.

a) Avoid redundancy in my code (repeating same transformation multiple times)
image
Could you kindly explain this a little further. This is a huge bug bearer of mine and is one of the largest time consuming bits. I have 60 tabs in excel for individual bonds, when I load them into BI they come in as different tables. They all need to be transformed, that transformation is the exact same of each table but I end up doing it 60+ times. I then append all 60 together into one query so I have all my data in one table.

Based on your comment here that is not the best way to do this. Any links or guidance as to what is a better way to manage all these sheets? One of the biggest issues for me to start a new report is having to do that transformation for hours again. Any hacks or automation would be so welcome.

b) Use multiple dataflows
Now this sounds exactly what I really need to be doing and it is what I was hoping to do. Any good course or structured learning on this within EDNA? I have been methodically working through the structured learning but have yet to come across this yet.

Ideally how I want to use BI going forward having several PBIX that I can call on in one report would be perfect and will avoid my issue of loading up more and more data into one report that shuts it down.

Can you provide one small mock up example in XLSX?

  • So an excel file with some dummy data in your specific format, that I can examine (couple of sheets will be more than enough
  • BUT also supply a separate file that contains an example of the required lay-out before append.

Just to recap. You have a single XLSX file with all bonds (one table per sheet per bond) and want to combine all the data in that file into a single table.
.

Dataflows is just PQ Online. And available for both Pro and Premium, in any workspace except the “my workspace” in the online service. No separate course required (although there are slight differences in UI because new features are introduced there first…
You can reuse a dataflow over and over again in your PBI models by connecting to it.

1 Like

In addition to @Melissa’s perfect suggestions:

If you are the only one managing everything and you don’t mind a little bit of management, then my suggestion will be to

  1. Do the initial append in VBA and create a separate tab/workbook, nicely written VBA code can do this even fater than PQ.

  2. Or in Python.

Thanks but not something I have great skills in to be honest, hence why I like to do it in BI as I understand the transformation steps

Financial Market Data.xlsx (9.0 MB)
Lessor Bonds Spot Data.xlsx (6.8 MB)
These are the files, they are direct link to Bloomberg. What I generally do (very open to suggestions) is to try append all of these with an identifier into one table in BI so that I can just use that one table for all my DAX

@Ronan If you are interested in VBA or Python I can write a script for you, but need a summary of exact steps.

1 Like

Thank you this is a very kind offer. the bond excel is in the reply I sent to Melissa. You will see from the second tab onwards I have individual bond historical prices.

The steps I do in Power Query in BI is:
a) Copy the security in cell C3
b) Remove Rows
I remove the top 7 rows
c) Transform - make top rows headers
d) Insert Column
I insert a column by example
I call it Identifier
I copy in the security ticker that was copied from cell C3

Hi @Ronan,

This script gets you all of the Bonds data in one single table. Just sub in your FileLocation

If you need this in multiple models, you can copy the query (CTRL+C) and paste it into a Dataflow with (CTRL+V). You might also need a GateWay to connect to your file but once set up you have this table ready for use in the Service (advantages: this code only needs to be excecuted once and you only have one query to maintain instead of a multitude of queries across several separate PBIX files)

let
        Source = Excel.Workbook(File.Contents(FileLocation), null, true),
        AddSecurityTicker = Table.AddColumn(Source, "Security ticker", each try Table.SelectRows(Table.FirstN([Data],7), each ([Column1] = "Security")){0}[Column2] otherwise null),
        SelectRows = Table.SelectRows(AddSecurityTicker, each ([Security ticker] <> null)),
        TransformTable = Table.AddColumn(SelectRows, "NewTable", each let myT = Table.PromoteHeaders( Table.RemoveFirstN([Data], 8)) in  Table.SelectRows( Table.RemoveColumns( myT, List.Select( Table.ColumnNames( myT ), each Text.StartsWith( _, "Column"))), each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))) )[[Security ticker], [NewTable]],
        CombineAllTables = Table.ExpandTableColumn(TransformTable, "NewTable", {"Date", "PX_LAST", "YLD_YTM_MID"}, {"Date", "PX_LAST", "YLD_YTM_MID"})
    in
        CombineAllTables

.
with this result

image

Haven’t validated the results, I will leave that to you.
Let me know if you have any issues.

I hope this is helpful.

Thank you Melissa,

Excuse my novice nature but where will I write this code? At what phase of the data loading do I do this?

My standard approach has been to go into Power Query > Get Data > Excel > Link to file > select relevant tabs > import

Hi @Ronan,

Sorry if that wasn’t clear.
Just create a new blank query, open the advanced editor and copy the full script in
Next inside the Source step replace FileLocation with your hard coded OR filepath parameter: “C:\Users\Name\BlaBlaBla.xlsx”

I hope this is helpful.

More than helpful!!!
Thank you so much

Side question would you say I should use this query function as my go to when I am bringing in a lot of excel?

@Ronan as always the correct but unsatisfying answer is “it depends…”
This method works well with equally formatted data AND a limited number of transformations.

But let’s be honest, there are so many ways to design a solution… And you will only learn and discover new possibilities by “doing” so if you can make changes to this code, excellent! But if you find another way to solve the puzzle, I would say: even better! :+1:

Thank you so much. This worked perfectly its like its new with no data load.

Anywhere you would advise to learn that coding? I have a few more data sets that are very similar in that they are exact same format but lots of tabs so I have been doing a lot of repetitive transformation of the same data.

I have tried to follow your code to see if I could replicate it with a few tweaks but it is proving a little challenging.

1 Like

Hi @Ronan,

There is course content in the portal, don’t know if you already worked through that?

The series should also be available on YouTube

You know what, I’ll add it to my “to do”-list and create a video.
Could help others as well…

1 Like

Thanks so much - I am only 40% through Mastering Dax Calculations at present having done the beginners guides and a few sample pieces. I must look at these as soon as I can

I am actually shocked at how dramatic the performance difference is with the M query function vs the excel, transform, append approach. It is not even on the same planet in terms of speed

Glad I could help!
Power Query at its best… LOL

image

1 Like