Hi @sam.mckay ,
I’ve about 500 parquet files in a SharePoint I need to retrieve.
I’m used to handling CSV files using functions like
-
= Table.AddColumn(#“-- staging CSV content”, “Custom”, each Table.PromoteHeaders ( Csv.Document([Content])))
With parquet I should use Parquet.Document function, but I get a Parameter.Error: Parquet.Document cannot be used with streamed binary values. when I try to use the content column of the table.
It seems Parquet.Document requires a real file and not a binary stream.
How can I overcome this?
Unfortunately I cannot provide a model, since I cannot remove references to company information.
Thanks for your help
Roberto
Hi @Roberto ,
since we don’t have dummy data set as well. could you please cross check if given link helps - https://www.youtube.com/watch?v=5hCznl9tOsk
Also please try to use below mentioned formula ?
= Table.AddColumn(#“Removed Other Columns”, “Custom”, each Parquet.Document([Content], [Compression=null, LegacyColumnNameEncoding=false, MaxDepth=null]))
Hi @rajender1984,
thanks for your reply.
I had seen the video but he use the combine and I need to implement the incremental refresh as I already did with CVS. Overall the dataset is more than 2 billion rows I update on a daily basis, adding the new data for the day.
I cannot replicate the Sharepoint but I got a file from kaggle and split it into two parquet files.
I’ve also created a simple pbix but it uses a folder as a source and this time the Parquet.Document works just fine.
Replicating the same but using a SharePoint the issue pops up again.
Thanks for help
Roberto
collecting parquet files in sharepoint.pbix (47.7 KB)
penguins_lter1.parquet (9.9 KB)
penguins_lter2.parquet (9.9 KB)
Hi @rajender1984,
after I slept on it and some googling I found this article from [Chris Webb’s blog] where I found the solution: before using the Parquet.Document I had to use the Binary.Buffer
Parquet.Document(Binary.Buffer([Content])))
The caveat is that parquet files can be big and the ETL can fail for memory issues.
Thanks
Roberto
(https://blog.crossjoin.co.uk/2021/03/07/parquet-files-in-power-bi-power-query-and-the-streamed-binary-values-error/)
1 Like