Dynamically defining Customers as "New", "Repeaters" and "Reactivated"

Hey,
I would like to ask for some help on a issue of dynamically defining customers.

For the company I work for, I need to segment the customers by the following definition:

New customer - A customer (customerID) with one purchase. No previous purchase before.
Repeater - A customer (customerID) with a current purchase and a previous one observed within 3 years (1095 days) prior to the current purchase.
Reactivated - A customer (customerID) with a current purchase and a previous one observed 3+ years (more than 1095 days) prior to the current purchase

The real model is much complicated, however I simplified it a lot.

This is what I need to achieve:

I created the rank as calculated column:

However I am lost here in subtracting different dates based on Earlier context.

Customer_segmentation_tyankata.pbix (118.8 KB)

Regards,
Tyankata

1 Like

Hello @tyankata,

Thank You for posting your query onto the Forum.

Would you be open to achieving the same results via Power Query method? In this scenario, by using Power Query based approach, it’s easy to achieve the same results without writing any complicated/convoluted DAX calculated columns and it’s much more efficient as well. You can paste the below provided M Code in the Advanced Editor -

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hY+9CsMwDIRfxXjOIMmWZO9dmz5AyNAGjyVQSqFvX/8kgdZDEYIDfdzppskignqNzg4WvQpALMqZy/Jcb+lhCDDaefgGFQQDZwVsTmlJ90YS9iTnCVWZ8/VdINogRBWUmhuVay6BGdfXbncEI0neQjIAsBaS/5CBchfpyCPd5avn6ikBuDxLuL8IGyQcnEprTI65t8OODEQxamv8Ezx/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Customer_ID = _t, Booking_ID = _t, Purchase_Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer_ID", Int64.Type}, {"Booking_ID", Int64.Type}, {"Purchase_Date", type date}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Customer_ID"}, {{"FullTable", each _, type table [Customer_ID=nullable number, Booking_ID=nullable number, Purchase_Date=nullable date]}}),
    #"Added Index" = Table.TransformColumns(#"Grouped Rows", {{"FullTable", each Table.AddIndexColumn(_, "Index")}}),
    #"Added Previous Purchase Date" = Table.AddColumn(#"Added Index", "Previous_Purchase_Date", each 
    let
        AllDataTable = [FullTable],
        PrevRowValue = 
            Table.AddColumn( 
                AllDataTable, "Previous_Purchase_Date",
                each try AllDataTable [Purchase_Date] {[Index] - 1}
                otherwise [Purchase_Date] )
    in
        PrevRowValue),
    #"Expanded Records" = Table.Combine(#"Added Previous Purchase Date"[Previous_Purchase_Date], {"Customer_ID", "Booking_ID", "Purchase_Date", "Previous_Purchase_Date"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Records",{{"Customer_ID", Int64.Type}, {"Booking_ID", Int64.Type}, {"Purchase_Date", type date}, {"Previous_Purchase_Date", type date}}),
    #"Added Days Difference" = Table.AddColumn(#"Changed Type1", "No_of_Days_Since_Last_Purchase", each Duration.Days([Purchase_Date] - [Previous_Purchase_Date]), Int64.Type),
    #"Added Customer Status" = Table.AddColumn(#"Added Days Difference", "Customer_Status", each if [No_of_Days_Since_Last_Purchase] = 0 then "New" else if [No_of_Days_Since_Last_Purchase] <= 1095 then "Repeater" else if [No_of_Days_Since_Last_Purchase] > 1095 then "Reactivated" else null),
    #"Removed Other Columns" = Table.TransformColumnTypes(Table.SelectColumns(#"Added Customer Status",{"Customer_ID", "Booking_ID", "Purchase_Date", "Customer_Status"}), {"Customer_Status", type text})
in
    #"Removed Other Columns"

I’m also attaching the working of the PBIX file alongwith the link of a video which is referenced in solving this approach.

Hoping you find this useful and meets your requirements that you’ve been looking for.

Important Note: For “Customer_ID = 11074793”, where “Purchase_Date = 05-12-2021”, the difference between current and previous purchase is “784 days” so it should fall under “Repeater” status since it’s less than 1095 days based on the scenario that you had provided. But in the screenshot of the expected output, you’re showcasing under “Reactivated” status which is incorrect. Please check once.

Thanks and Warm Regards,
Harsh

Customer Segmentation - Harsh.pbix (132.0 KB)

3 Likes

Thank you, @Harsh !
I am going to check it tomorrow and let you know.

Best,
Tyankata

1 Like

Hello @tyankata,

In case, you still require a DAX based solution then here’re the formula’s. In the form of both i.e., “Calculated Columns” and “Measures” -

A). Calculated Columns -

1). Based on No. of Days logic i.e., 1095 Days -

Customer Status (by Days) - DAX = 
VAR _Current_Customer_ID = 
Data[Customer_ID]

VAR _Current_Purchase_Date = 
Data[Purchase_Date]

VAR _Days_Difference = 
IF( 
    ISBLANK( CALCULATE( MAX( Data[Purchase_Date] ) ,
        FILTER( Data ,
            Data[Customer_ID] = _Current_Customer_ID &&
            Data[Purchase_Date] < _Current_Purchase_Date ) ) ) ,
    BLANK() ,
    Data[Purchase_Date] - 
    CALCULATE( MAX( Data[Purchase_Date] ) ,
        FILTER( Data , 
            Data[Customer_ID] = _Current_Customer_ID &&
            Data[Purchase_Date] < _Current_Purchase_Date ) ) )

VAR _Results =
SWITCH( TRUE() , 
    _Days_Difference = BLANK() , "New" , 
    _Days_Difference <= 1095 , "Repeater" , 
    _Days_Difference > 1095 , "Reactivated" )

RETURN
_Results

2). Based on No. of Years logic i.e., 3 Years -

Customer Status (by Years) - DAX = 
VAR _Current_Customer_ID = 
Data[Customer_ID]

VAR _Current_Purchase_Date = 
Data[Purchase_Date]

VAR _Next_Purchase_Date = 
CALCULATE( MAX( Data[Purchase_Date] ) ,
    FILTER( Data ,
        Data[Customer_ID] = _Current_Customer_ID &&
        Data[Purchase_Date] < _Current_Purchase_Date ) )

VAR _Year_Difference = 
DATEDIFF( Data[Purchase_Date] , _Next_Purchase_Date , YEAR )

VAR _Results =
SWITCH( TRUE() , 
    _Year_Difference = BLANK() , "New" , 
    _Year_Difference >= -2 , "Repeater" ,
    _Year_Difference < -2 , "Reactivated" )

RETURN
_Results

B). Measures -

1). Based on No. of Days logic i.e., 1095 Days -

Customer Status (by Days) - DAX Measure = 
VAR _Current_Customer_ID = 
MIN( Data[Customer_ID] )

VAR _Current_Purchase_Date = 
MIN( Data[Purchase_Date] )

VAR _Days_Difference = 
IF( 
    ISBLANK( CALCULATE( MAX( Data[Purchase_Date] ) ,
        FILTER( ALL( Data ) ,
            Data[Customer_ID] = _Current_Customer_ID &&
            Data[Purchase_Date] < _Current_Purchase_Date ) ) ) ,
    BLANK() ,
    CALCULATE( MIN( Data[Purchase_Date] ) ) - 
    CALCULATE( MAX( Data[Purchase_Date] ) ,
        FILTER( ALL( Data ) , 
            Data[Customer_ID] = _Current_Customer_ID &&
            Data[Purchase_Date] < _Current_Purchase_Date ) ) )

VAR _Results =
IF( ISINSCOPE( Data[Customer_ID] ) , 
    SWITCH( TRUE() , 
        _Days_Difference = BLANK() , "New" , 
        _Days_Difference <= 1095 , "Repeater" , 
        _Days_Difference > 1095 , "Reactivated" ) )

RETURN
_Results
Customer Status (by Years) - DAX Measure = 
VAR _Current_Customer_ID = 
MIN( Data[Customer_ID] )

VAR _Current_Purchase_Date = 
MIN( Data[Purchase_Date] )

VAR _Next_Purchase_Date = 
CALCULATE( MAX( Data[Purchase_Date] ) ,
    FILTER( ALL( Data ) ,
        Data[Customer_ID] = _Current_Customer_ID &&
        Data[Purchase_Date] < _Current_Purchase_Date ) )

VAR _Year_Difference = 
DATEDIFF( MIN( Data[Purchase_Date] ) , _Next_Purchase_Date , YEAR )

VAR _Results =
IF( ISINSCOPE( Data[Customer_ID] ) , 
    SWITCH( TRUE() , 
        _Year_Difference = BLANK() , "New" , 
        _Year_Difference >= -2 , "Repeater" ,
        _Year_Difference < -2 , "Reactivated" ) )

RETURN
_Results

Below is the screenshot of the final results provided -

I’m also attaching the working of the PBIX file for the reference purposes.

Thanks and Warm Regards,
Harsh

Customer Segmentation - Harsh v2.pbix (126.3 KB)

5 Likes

Hello @Harsh,
Thank you for your prompt and detailed answer.

All of the solutions are now tested in the test file and work pretty fine.

However, in the real model I had to use the calculated column solution.
The model is quite big, and Group in Power Query doesn’t perform quite well.
I need to use this values ( New, Repeater and Reactivated) as a slicer, rows and columns in a table and that’s why this solution.

Best regards,
Tyankata