Get Median Value by Patient (In correct result)

Hi, my Median() by patient - dax logic gives me incorrect result.

Requirement:
I need to get Median Time by Test
I need to get median time by patient. If multiple tests are found for a given patient - for selected period, use the latest “collect” or “receive date” (based on date type selection) of the record for the median computation.

Working
Median Time by Test - Correct result

DAX:-
Median Time by test =
var med = MEDIAN(fact_detail[tat_time])
RETURN SWITCH (TRUE()
, [selected date type] = “Collect”, med — default active relationship
, [selected date type] = “Receive” , CALCULATE (med
, USERELATIONSHIP(dim_date[dim_date_id], fact_detail[rec_dim_date_id] ))
, med)

Problems
1- my Median Time by patient result is incorrect.
2- Dax query challenge, do not know how to force relationship (USERELATIONSHIP function) on getting max(collect) or max(receive) per patient based on Date Type slicer (see pic below).

DAX:
Median Time by patient =
VAR vTable = SUMMARIZE(fact_detail, fact_detail[PatientIdentifier]
, “coll_dt” , MAX(fact_detail[coll_dt] )
)
VAR vFinal = NATURALINNERJOIN(vTable, fact_detail)
RETURN MEDIANX(vFinal, fact_detail[tat_time] )

Data model that I had set up:-

  1. active relationship -dim_date[dim_date_id] --> fact_detail[coll_dim_date_id] = collect
  2. non-active relationship -dim_date[dim_date_id] --> fact_detail[rec_dim_date_id] = receive

Could someone please guide me a little.
Here is the raw data file. Raw data.xlsx (298.9 KB)
Here is the PBIX file. Median Time.pbix (234.3 KB)

Hi,

You may find this link helpfull, explaining how to deal with between dates situations.

Paul
image

@akiko_lim,

I’ve been working on this one for a while, and I can’t figure out why 39 is the correct value for the median by patient for colldate only = 1/1/19?

When I compute this, I get a value of 204. Here’s the DAX measure I’m using to calculate the median by patient for 1/1/19:

vMed Test 1/1/19 = 

VAR vTableTest = 
CALCULATETABLE(
    ADDCOLUMNS(
        SUMMARIZE(
            fact_detail,
            fact_detail[PatientIdentifier]
         ),
        "TotTatTime", [Total TatTime]
    ),
     FILTER(
         fact_detail,
         fact_detail[coll_date_only] = DATE( 2019, 1, 1 )
     )
)

RETURN
MEDIAN( 'vTable Test'[TotTatTime] )

There are three major differences between yours and my approach:

  1. I reworked the data model to build the relationship links based on the dates themselves not the ID number
  2. I am taking all observations of Tat Time and summing them together by patient before calculating the median
  3. you are including only a patient’s Tat Time associated with the MAX(colldt). This accounts for why our median results are so different.

I’ve attached my PBIX file. This does not represent a full solution by any means, since I want to resolve the above questions before proceeding. I am confident that my data modeling changes are appropriate, given the time intelligence nature of this problem. However, while my approach in 2) above make sense analytically to me, it may reflect a misunderstanding of how the patient/test process works.

Please take a look and let me know whether you think I’m on the right or wrong track here, and then we can figure out how to proceed.

image

I hope this is helpful.

Thanks Paul & Brian. Appreciate you both reaching out to share your tot on this.

Brian, to comment on your Dax Query:-
Instead of sum(TotTatTime), the Median needs to be retrieved at the row of the last day = max(coll_dt) of each patient.

E.g. I have the following dataset
Patient A > Tat_time = 32 min > coll_dt = 2019-01-01 4:00 am
Patient A > tat_time = 50 min > coll_dt = 2019-01-01 7:15 pm
Patient B > Tat_time = 43 min > coll_dt = 2019-01-01 12:05 pm

To compute the Median Time by Patient, I need to grab the following dataset as part of the computation.

Patient A > tat_time = 50 min > coll_dt = 2019-01-01 7:15 pm
Patient B > Tat_time = 43 min > coll_dt = 2019-01-01 12:05 pm

Where Patient A-- I need to grab record with coll_dt = 2019-01-01 7:15 pm since it’s the last record available.

Hence, my median time by patient = ( 50 + 43 ) /2 = 46.5

Does it make sense ?

Appreciate your kind effort as always.

@akiko_lim,

Thanks for the explanation and clarification. Now that I understand your operational process better, I will rework my measure and be back in touch soon.

  • Brian

@akiko_lim,

I’m continuing to make progress on this and think I might be pretty close to a solution, except that I don’t know how to handle this relatively common situation in your dataset where patient ID and MAX coll_dt by patient don’t uniquely define a Tat Time. In the case below, what does your business process indicate regarding whether I should use the MIN (10) or the MAX (120) value in calculating the MEDIAN Tat Time?

Thanks.

– Brian

image

Hi Brian, In this case, please use Max(120). Thanks !

@akiko_lim,

I’m still working on this one, but close to a solution. Should have it done later today.

  • Brian

@akiko_lim,

This one put up a good fight, but I think I finally got it via the following three measures and one calculated column:

The first measure calculates the max rec-dt by patient ID:

   MaxRecDatebyPat = 

CALCULATE(
    MAX( fact_detail[rec_dt] ),
    ALLEXCEPT( 
        Patients,
        Patients[PatientIdentifier]
    )
)

The second measure similarly calculates the max tat time by patient ID

MaxTatbyPat = 

CALCULATE(
    MAX( fact_detail[tat_time] ),
    ALLEXCEPT( 
        Patients,
        Patients[PatientIdentifier]
    )
)

Now, based on the business rules you explained above, to be included in the median calculation a record has to meet both of the conditions defined via the above two measures. I did this check as a calculated column, although it can (and probably should) be done as a measure. I used a calculated column, however, because due to the complexity of the analysis it made it much easier for me to debug being able to see it directly as part of the fact table.

Screener Both Max RecDt = 
CALCULATE(
    IF(
        SELECTEDVALUE(fact_detail[rec_dt] ) = [MaxRecDatebyPat] &&
        SELECTEDVALUE( fact_detail[tat_time] )  = [MaxTatbyPat],
        1,
        BLANK()
    ),
    ALLEXCEPT(
        fact_detail,
        fact_detail[PatientIdentifier],
        fact_detail[rec_dt]
    )
)

And now here’s the final measure, that filters the fact table based on the measure above and then calculates the median tat time of the filtered table, returning the correct value of 39.

MedX Test = 

VAR vTableFilt =
        FILTER(
            fact_detail,
            fact_detail[Screener Both Max RecDt] = 1
        )
      
RETURN
    MEDIANX(
        vTableFilt,
       [MaxTatbyPat]
    )

I hope this is helpful. Full solution file attached below.

1 Like

Thanks Brian. This is a great help. Let me do some data checking on my end. Will update! You are awesome!

Good morning Brian,
With your help (following your way of thinking), I was able to figure out the correct median Time by Patient (with complex logic where max(rec_dt) & max(tat_time) by patient ).

I made a minor fix to your logic to grab Max(rec_dt) per patient.

I created a new measure (where it gets the max(rec_dt) by Patient.
I created a new KPI Card to show the calc from the new measure - suffix with xxxAL

Note: Median TAT - 39 provided in my 1st post was based off Coll_dt. Since your formula is based off Rec_dt, the correct answer is 37. (based on my sql db) I need to do these 2 diff computations in my report anyway - based off coll_dt & rec_dt. So, this works.

MaxRecDateByAL =
VAR PatientID = min(fact_detail[PatientIdentifier])
VAR maxDay = MAXX(FILTER(ALLSELECTED(fact_detail), fact_detail[PatientIdentifier] = PatientID ) , fact_detail[rec_dt] )
RETURN maxDay

Here is the updated pbix file with my formulas. eDNA Forum - Median Test Solution.pbix (332.4 KB)

Thanks again ! :slight_smile:

@akiko_lim,

Great – I appreciate the update, and am glad to hear you got this working exactly as desired.

You also just solved a mystery for me - as I was working on this, I wondered how you were validating the correct answers, since the multiple screening criteria are too complex to apply in Excel without major effort. Thus, I was creating small testing subsets of the raw data where I could apply the screening criteria manually to test whether my measures were doing what I thought they were doing. Knowing now that you were validating independently on a sql db answers that question.

I really enjoyed working with you on this - some very interesting issues here.

Thanks.

  • Brian