Power Query performance when parsing JSON records

I’m hoping to get some advice about speeding up my query.

Data source info:
• Json format
• .log files (text format)
• PBI Gateway mashup log files (you can get when additional logging in enabled)

The files themselves are not big, for my sample data there are 9 files, at around 1.5 – 2.5 mb each but the load time is quite long, around 20 minutes.

This happens when the query runs, the Table.ExpandRecordColumn line of the code. The progress monitor at the bottom of PQ shows loading tens or hundreds of MB per file.

Example below is if I have 1 file in the directory size, 1,191 KB, loading shows > 68 MB.
1

When the JSON parsed, it creates a Record for each line of JSON.

I’m assuming that, as the record is expanded, it’s held in memory before being expanded to the table.

However, I’m confused as the final PBI file is not large, less than 1MB. For this 1 sample file, the pbix file is only 119 KB.

There is quite a lot of data, just that one file creates 70 columns with 1,568 rows. The amount of data being loaded can’t be reduced.

Does anyone know why the query is using so much memory when loading even though the source file is not big and the final pbix file is also not big?

Is there any way we can speed it up?

PBIX file & sample data file attached, the data in the sample file doesn’t take long to load, but as soon as the files become larger and more numerous, the load takes much longer:
Tim GW Logs analyser post.pbix (208.8 KB)

The M code:


// FileLocation
"C:\Users\me\Documents\Data Local\My Logs" meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]
 
// FuncProcessMashupLogs
let
            // Load files from the specified folder
            Source = Folder.Files(FileLocation),
 
            // Buffer the source table to improve performance in subsequent steps
            BufferSource = Table.Buffer(Source),
 
            // Filter to include only Mashup log files and exclude irrelevant files
            FilterMashupLogsOnly = Table.SelectRows(BufferSource, each Text.StartsWith([Name], "Mashup") and not Text.Contains([Name], "Container") and not Text.Contains([Name], "Evaluation")),
           
            // Remove all columns except for the Content column
            RemoveOtherColumns = Table.SelectColumns(FilterMashupLogsOnly, {"Content"}),
 
            // Add a custom column converting binary content to text lines
            AddCustom = Table.AddColumn(RemoveOtherColumns, "Custom", each Table.FromColumns({Lines.FromBinary([Content], null, null, 1252)})),
           
            // Buffer the table after adding the custom column
            BufferAddedCustom = Table.Buffer(AddCustom),
 
            // Keep only the newly added custom table
            KeepOnlyTables = Table.SelectColumns(BufferAddedCustom, {"Custom"}),
 
            // Expand the custom table
            ExpandCustom = Table.ExpandTableColumn(KeepOnlyTables, "Custom", {"Column1"}),
 
            // Parse the JSON in the expanded column
            ParseJSON = Table.TransformColumns(ExpandCustom, {"Column1", Json.Document}),
           
            // Remove rows with errors in the parsed JSON column
            RemoveErrors = Table.RemoveRowsWithErrors(ParseJSON, {"Column1"}),
           
            // Drill down to the records in the parsed JSON column
            DrillDownRecord = RemoveErrors[Column1],
           
            // Get a list of all field names in the JSON records
            FieldNames = List.Union(List.Transform(DrillDownRecord, each Record.FieldNames(_))),
 
            // Expand the JSON records to columns
            ExpandRecord = Table.ExpandRecordColumn(Table.FromList(DrillDownRecord, Splitter.SplitByNothing()), "Column1", FieldNames, FieldNames),
 
            TableBuffer = Table.Buffer( ExpandRecord, [ BufferMode = BufferMode.Delayed ] ),
 
            // Transform AccessedResources only if it's a list
            ResourceValues = Table.TransformColumns(
                TableBuffer,
                {"AccessedResources", each if _ is list then Text.Combine(List.Transform(_, Text.From), " ; ") else _, type text}
            ),
 
            // Transform PartitionInputs only if it's a list
            PartitionValue = Table.TransformColumns(
                ResourceValues,
                {"PartitionInputs", each if _ is list then Text.Combine(List.Transform(_, Text.From), " ; ") else _, type text}
            ),
 
            // Convert all column types to text
            AllTypesText =
                Table.TransformColumnTypes(
                    PartitionValue,
                    List.Transform(
                    Table.ColumnNames(PartitionValue),
                    each {_, type text}
                    )
                ),
 
            //Add a date time key column
            DateTimeKeyColumn = 
 
                Table.AddColumn(
                    AllTypesText ,
                    "DateTimeKey",
                        each
                            let
                                Key =       Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Start]),each if Value.Is(Value.FromText(_), type number) then _ else null))),
                                Column =    Text.Start( Key, 8) & "."  & Text.End( Key, Text.Length(Key)-8 )
                            in
                                Column
 
                ),
 
 
            ChangeTypes = Table.TransformColumnTypes(DateTimeKeyColumn,{{"Start", type datetime}, {"entryCount", Int64.Type}, {"totalSize", Int64.Type}, {"hits", Int64.Type}, {"requests", Int64.Type}, {"hitrate", type number} , {"DateTimeKey", type number} })
        in
            ChangeTypes

Hi @Timmay ,

While waiting for the community to review and respond to your issue, take advantage of Data Mentor . It offers a wealth of tools and resources that could provide immediate solutions and enhance your report-building efficiency.

Cheers,

Enterprise DNA Support Team

Hi @Timmay,

Not seeing a data file, only a PBIX
Could you provide that JSON sample? Thanks.

1 Like

Mashup20240614.000000001.log (2.4 MB)
Mashup20240616.000000000.log (154.9 KB)

Sure thing Melissa, 2 sample files attached, one larger another smaller. Thanks for taking a look!

Thanks @Timmay

Give these changes a go and report back.
Step 1, create this function query and call it: Transform

(logFiles as binary) as table =>
let
    BinaryToTable = Table.FromColumns({Lines.FromBinary(logFiles, null, null, 1252)}), 
    ParseJSON = Table.TransformColumns( BinaryToTable, {"Column1", Json.Document}), 
    RemoveErrors = Table.RemoveRowsWithErrors( ParseJSON, {"Column1"}), 
    FieldNames = List.Buffer(
        List.Union(
            List.Transform(RemoveErrors[Column1], each Record.FieldNames(_))
        )
    ), 
    ExpandRecord = Table.ExpandRecordColumn(
        RemoveErrors, 
        "Column1", 
        FieldNames
    ), 
    ResourceValues = Table.TransformColumns( ExpandRecord, 
        {
            {
                "AccessedResources", 
                each if _ is list then Text.Combine(List.Transform(_, Text.From), " ; ") else _, 
                type text
            }, 
            {
                "PartitionInputs", 
                each if _ is list then Text.Combine(List.Transform(_, Text.From), " ; ") else _, 
                type text
            }
        }, 
        Text.From, MissingField.UseNull
    ), 
    DateTimeKeyColumn = Table.AddColumn( ResourceValues, 
        "DateTimeKey", each 
        let
            Key = Text.Combine(
                List.RemoveNulls(
                    List.Transform( Text.ToList([Start]), 
                        each if Value.Is(Value.FromText(_), type number) then _ else null
                    )
                )
            ), 
            Column = Text.Start(Key, 8) & "." & Text.End(Key, Text.Length(Key) - 8)
        in
            Column
    ), 
    ChangeTypes = Table.TransformColumnTypes( DateTimeKeyColumn, 
        {
            {"Start", type datetime}, 
            {"entryCount", Int64.Type}, 
            {"totalSize", Int64.Type}, 
            {"hits", Int64.Type}, 
            {"requests", Int64.Type}, 
            {"hitrate", type number}, 
            {"DateTimeKey", type number}
        }
    )
in
    ChangeTypes

Step 2, your FuncProcessMashupLogs query logic can now be updated to:

let
    // Load files from the specified folder
    Source = Folder.Files(FileLocation),
    // Filter to include only Mashup log files and exclude irrelevant files
    FilterMashupLogsOnly = Table.SelectRows(Source, each Text.StartsWith([Name], "Mashup") and not Text.Contains([Name], "Container") and not Text.Contains([Name], "Evaluation"))[[Content]],
    invokeTransform = Table.Combine( Table.TransformColumns( FilterMashupLogsOnly, {"Content", Transform})[Content] )
in
    invokeTransform

I hope this is helpful

1 Like

Amazing Melissa, that’s infinitely faster than my code.

1st test, run it on around 9 files about 2.5mb each, time to process from 10 mins to a few seconds.

2nd (the ultimate) test, a really big set of log files: 31 files, > 46 mb each, total 1.34 gb. Previously this would run for hours and then crash my pc if I let it run. This completed in about 20-30 mins, amazing.

I will be studying the code, along with your book that I have just purchased the other day!

Thank you so much again,
Tim

Awesome, glad to hear that worked well for you.
Enjoy the book :wink:. When you have a chance, I look forward to your feedback.

All the best.

1 Like