Transforming Data from matrix to columns - ugly


#1

I’m aggregating a bunch of information collected routinely across a bunch of playing fields. The guys collecting this data have an Excel sheet which gets the data entered for each field, for each day, and to reduce mistakes the Excel sheet is set up exactly the same as the paper sheet, and I am not able to change any of this.
The upshot is that my source Excel files that I’ll be using are arranged as follows:


and I’ll need to take each of these individual cells (25 in each row) and make these columns, so each column will be named something like:
(Can only upload 1 image)
Column 1’s name then is a combination of Variable name one (Height of cut (warm season grass)) & Location (North goal) and Test Number (1), Column 2 is (Height of cut (warm season grass)) & Location (North goal) and Test Number (2) etc
This will be done for 27 separate variables (each having 25 observations - 5 locations X 5 tests, each of which needs an individual column) ie 675 columns.
If I was to use a macro of course I could reshape the data by copying and pasting, but this would mean having to run the macro and re-save the data every time a new sheet comes in, and over the next year or two, this will ramp up to about 150 fields with data coming in most days, so that’s untenable.
What I’d really like to be able to do is Import Files from folder, and have the query extract all these data so that each sheet will end up yielding just one row of data, albeit with 675 columns that then becomes aggregated for analysis and visualisation.
I guess one way would be to right click on individual cells, and drill down and save as a new query, and then append each of these (675) queries later, but that seems clumsy.
Before I go much further, I’d love to hear some alternates as I can’t be the first person to solve this problem I’m hoping.
Final wrinkle is that with each sheet, at the top there are 8 cells that also need to be added as columns - Field location, Field name, Irrigation, etc


#2

Hi,

Would you have an example excell file? Would help to understand the data a bit and see how you could import it?
What about the final visualisation you would like to see? Would you have some rough example how that would look like?

Thanks!

R,

Koen


#3

Thanks for taking a look at this for me.

Here’s a link to a dummy file that is fairly close to the source:
https://1drv.ms/x/s!AgytcDfT4DRFhMp4FNEPaQjSdM7PCw
The sheet ‘Data Entry sheet’ is the one I’ll be pulling the data from.
I created the the sheet ‘Partial Desired Output’ which has the desired output, but I just completed the first 110 columns.
The first 10 are pulling the ‘demographic’ information about the test, and then the next 100 are the first 4 variables, each of which has 25 measurements.
Hopefully this is clear


#4

HI Rod,

I had a quick look at your sheet and its not a very easy format I have to say :slight_smile: so I can understand the struggle. I managed to get some data out but its by using multiple queries etc so I am not sure how to merge all these steps into one function so you can do it on each file in the folder.

Is there a way to add an extra sheet to these measurements which has a table like
you describe in the backend by using excell formulas? So mapping the quite dense format into
the column format you wish, but using excell? Or is this historical data you would like to process?

I feel a bit of scripting would be needed to fetch all the data out of these sheets into a nice formatted
table structure which you then can add to power bi.

Not sure though as I am also curious how you would solve this most neatly using the transform steps in the power query editor.

Some things which could help is this link:

https://exceleratorbi.com.au/combine-excel-workbooks-power-query-method-1/

But with your quite difficult to access data I am not sure how to do this.


#5

Thinking further about this, can’t you just add formulas to the sheet so that it links to a second sheet which then has the data nicely formatted?

In that way you can use power bi with the folder feature to read always the second sheet in the excell sheet in stead of the one the user is filling in?


#6

Thanks kvbe - I am separately asking them if I can simply add an extra hidden sheet which parses all these data into a simple wide table with 1 header row & 1 data row, which will then make things simple, but this has got me thinking that other people must come across this issue as well, and it would be good to know what the best practice is to solve this sort of problem.
Maybe it turns out that Power Query is not best practice when your data is set up like this and can’t be changed…


#7

Really like this solution actually.

I try myself to stay away from too many complex transformations in pq if I can.

Mainly because it can sometimes be quite difficult to audit what’s gone wrong if things fall over.


#8

Yep really do think this is likely a better workaround.

Bit of work on the front end here, will prevent some overly complex transformations.

PQ is amazing though, and can be used in many many situations but maybe not perfect solution for this original data structure.