Aggregation Types (easy topic)

Hello community,

I am currently in an argument and looking for some support from my like-minded friends. I have come across a discrepancy between myself and a ‘customer’ in how certain metrics should be aggregated. A customer has developed a habit of aggregating their data using method #2 below when I am predominately aggregating using method #1 and usually taking it to the P50 median. Can someone point me to any documentation or publication that shows method #2 is incorrect (or disproves my method1).

Thanks in advance and I apologize if this is the incorrect board.

2021-09-02_screenshot

@JoeRobert,

I will go out on a limb and suggest no one will provide such documentation since all 3 metrics listed are completely valid statistical values. It 100% depends on what particular statistical measurement is desired.

I would also suggest that the Avg in Method 1 is the most suitable for a “per well value across all wells” perspective, allowing you to see which individual wells are performing above or below that average, while the Method 2 average is suitable for calculating a single value that is useful for projections, year-over-year comparisons by group of wells, etc.

The customer has to define which variant meets their needs. It follows that the only method which is incorrect is the one which does not support the customer’s needs.

Hope this helps the debate :slight_smile: .

John C. Pratt

3 Likes

Thanks for the reply!

I am starting to think that one method is more accurate than the other and is depended on the question you are trying to answer.

Regarding your comment on the avg value for method 1, see below screenshot as reference. I have modified well 4 to be an outlier by unrealistically reducing the time duration. By doing this, the average value for the grouping is now erroneously higher than majority of the data points. I believe the average value does not properly represent the dataset and therefore median P50 should be showcased.

Also regarding your 2nd comment on method 2, I went ahead and created method 3 that averages both depth and time variables before normalizing. The final aggregate ends up being the same as the method 2 value. I believe it would be inappropriate to show this value as YoY comparison because the two average variables are essentially independent from one another. If the objective is to show cycle time ‘speed’ for a group of wells, why not normalize the metric at the well level instead of smoothing out the variables before making the aggregation?