Power BI Challenge 6 - Insurance Complaints

Thanks @alexbadiu. Yes, I have invested a lot of time in completing the report. In power BI, every additional visual takes a lot of time and the worst part is when you are done completing the visual and suddenly a new idea kicks in and now you are changing that visual. This challenge has also taught me that good relationships are so important :slight_smile:

1 Like

Thanks @BrianJ for your comments and defining the elements of my report. I had to work very hard for this challenge. First I read the Definitive Guide to DAX book previous week and tried to absorb the concepts. Then this week I started working on the challenge. It does not matter how much you read but the concepts will only be cleared once they are practically implemented and these challenges giving the outlet to do so.
I could feel the confidence while writing the Dax codes but still a very long way to go. Some crazy ideas I had to abandon because I couldn’t make them work due to lack of skills in Dax at the moment.
Learning is a never ending process though.

3 Likes

Thanks @Melissa. It’s because of you as you raised the bar so high last time that it became difficult to come even close to your work. I will definitely give a detailed write up of the report however, can’t share the link within my tenant. I had one thing to add in the report that I really struggled with but will go ahead and complete it and will ask on the forum for help also.

1 Like

Hello Everyone,
This is one of the best challenges yet from @haroonali1000 and thanks for challenging us in different ways every few weeks. I am learning a lot.

For this report I created one navigation page and five different pages containing insights such as : -

  1. Complaints Summary.
  2. Claim Statuses.
  3. Customer demographics and satisfaction.
  4. Regional Analysis.
  5. Brokers insights.

Feedback and criticism are welcome.

Here is a Link to the Published report

P.S. - I Improved the home and Navigation page after a positive feedback from @BrianJ . Not perfect but I hope it looks better.

Thank You,
Nebiyu

12 Likes

Great job @Neba. I love the part un Claims Status Analysis where you have shown status days in Matrix table. This is the part I could’t able to complete and still scratching my head to find the solution. Can you tell us how you did it?

Thank you for your feedback @MudassirAli, Of course I will share the DAX formula. I struggled to figure that out too to be honest, but I found a YouTube video on Curbal that explains how to calculate similar situations and shout out to @Greg he posted similar DAX formulas in the forum as well.

Here is the link to the video :-

So, my understanding is there is a status id ( 1 - 9 ) column grouped by complain Id and date of the status update on the Status History table. Therefore, what I did was I created a calculated column that captures the day difference from previous row date and current row date for each complain Id Group and stamps it in a column.

The logic I used is :-

Diff. Current v Prev.s Dates = 
VAR Mindate = CALCULATE( MIN( Complains[Complain Date] ) , ALLEXCEPT( Complains , Complains[ID] ) )
VAR Demoteddate = 
            CALCULATE( 
                MAX( 'Status History'[Status Date] ) , 
                FILTER( 
                    ALLEXCEPT( 'Status History' , 'Status History'[Complaint ID] ) ,
                    'Status History'[Status Date] < EARLIER( 'Status History'[Status Date] ) ) )
RETURN
IF( 'Status History'[Status Date] = Mindate , 
        DATEDIFF( 'Status History'[Status Date] , 'Status History'[Status Date] , DAY ) ,
              DATEDIFF( Demoteddate , 'Status History'[Status Date] , DAY ) )

I used the Complains Fact table complain date as a starting date of the claim.

Then I used that calculated column to create following measure to concatenate the day numbers to " Days".

Status Progression by Day = 

SWITCH( TRUE() ,
    CALCULATE( SUMX( (  'Status History' )  , 'Status History'[Diff. Current v Prev.s Dates] ), 
        USERELATIONSHIP( 'Status History'[Complaint Status ID] , Statuses[ID] ) ) = BLANK()  , BLANK() ,
            CONCATENATE( CALCULATE( SUMX( (  'Status History' )  , 'Status History'[Diff. Current v Prev.s 
                                           Dates] ), 
                                   USERELATIONSHIP( 'Status History'[Complaint Status ID] , Statuses[ID] ) ) , " Days"  
                    )
                    
                     )

Hope that was clear.

Regards,
Nebiyu

3 Likes

@Neba Wow for the explanation. I was so frustrated and wasn’t happy with my report because i wasn’t able to generate insights in status claims as I couldn’t find the solution for this. I am going to give this a try and will let you know.

Thanks a lot again :+1:

@MudassirAli No problem. You created an impressive report. I really like it.

1 Like

@Neba…The formula wasn’t the issue, it was the relationship . Now I have created a bridging table between Statuses & Status History to solve the issue. That was such the easy fix but didn’t click to me before. :thinking:

Challenge Participants,

Well… I had grand plans for my entry on this excellent challenge. My strategy was to use the challenge to explore some of the AI and machine learning capabilities of Power BI that I hadn’t previously delved deeply into. Here’s how I was planning to structure my entry:

  1. Fraud detection algorithms built in DAX and R to detect irregularities in the data based on Benford’s Law (based on a suggestion from @Paul)
  2. Analysis and visualization of status changes and when they happened using Synoptic Panel to transform a static process flow diagram into a dynamically linked visual that would depict status changes clearly and graphically
  3. Complaints broken down by dimensions using Microsoft Research’s Sand Dance visualization/ML interface.
  4. Worst offending brokers analyzed using Power BI Key Influencers visual

Here’s how it shook out: #1 I think was a big success, #2 was a partial fail – I learned a lot, ultimately got it to work but it ended up being a confusing visual that didn’t provide significant value or insight. In the right situation, I do think this is a good technique and you’ll probably end up seeing me use it down the road in another challenge. One that you will not see me use ever again is #3 Sand Dance. This section was an epic fail – I hated the interface and the look of the resulting visuals. #4 I think worked pretty well in terms of making what would be quite a complex analysis easily accomplished and accessible to even relatively new users. Given the failures on sections #2 and #3, I don’t have a complete entry to present, but thought some may find value in the explanation of #1 and #4, so here goes:

FRAUD DETECTION ALGORITHMS

Given that this challenge is about preparation for an audit, I thought focusing first on the reliability and validity of the data was a good starting point. Forensic accounting frequently uses Benford’s Law to detect irregularities in data that may be attributable to tampering/fraud. Benford’s Law states that:

“In many naturally occurring collections of numbers, the leading digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time.” (Wikipedia)

Benford’s law has been shown to apply to series’as varied as transaction level accounting data, census data, home addresses, areas of lakes, stock prices, death rates, home prices, distances between planets, and a wide array of other types of data series.

For this challenge, I calculated using some virtual table-heavy DAX measures the frequency distribution of the first digits of the daily number of complaints, the expected reimbursement amounts and the population of the cities of the clients associated with each complaint. What I found is summarized in the visuals below, which compare the observed distribution of the first digits of each of the chosen fields versus expected distribution under Benford’s Law.

I then tested using the attached R script whether statistically the observed distribution conformed to the expected Benford distribution (note: this cannot be done reasonably within Power BI, because to generate the necessary P values to evaluate statistical significance you need to simulate 10,000 trials of this test. R can do this with just a few lines of code and almost instantaneously). In the case of complaints and expected reimbursements, the distributions did conform, while the distributions associated with city population did not. The first two results strongly imply that these data are real, not fabricated. The third result strongly indicates that this data either is fraudulent or has been randomly assigned to clients in a way that does not represent their actual locations and other attributes of the regional table. I groundtruthed these findings with @Haroonali1000, and he confirmed that my findings for all three were indeed correct.

This is a powerful technique that can be used in many different contexts to evaluate data quality.

IDENTIFYING WORST-PERFORMING BROKERS USING KEY INFLUENCERS VISUAL

There are a lot of different ways one could define “worst performing” per the requirements of the brief. I thought the most direct way was to identify which brokers had the lowest levels of client satisfaction. To do this in a statistically rigorous way, one would typically need to build a multiple regression model. However, standard regression models won’t work if the outcome variable is categorical or in this case binary (satisfied versus not satisfied). To do that analysis you would typically need to employ a logistic regression model in R or a similar statistical package. However, Microsoft fairly recently incorporated the Key Influencers visual which easily does the heavy lifting of building the logistic regression model for you.

Simply by selecting the Key Influencers visual and dropping the right fields into the appropriate wells,

image

you get a fully interactive logistic regression visual that identifies the worst performing brokers:

By dropping down the outcome variable slicer, you can also quickly change the visual to show the best performing brokers.

Anyway, disappointed that I don’t have a full entry to submit, but hope you found some value in what I learned through experimentation above. As I always do for my entries, here’s documentation of the tools I used to create the analyses above:

Hats off to the participants so far – all terrific entries, and I can’t wait to see what the rest of you have done with this challenge.

  • Brian

PS – for those interested I’ve attached a copy of the R script that tests the distribution conformity.
Data Challenge 6.R (1.4 KB)

PPS - For those interested in learning more about the history and application of Benford’s Law, check out the Netflix series Connections, episode #4 entitled “Digits”. It’s a fascinating exploration of how Benford’s Law applies to finance and accounting, music, social media, photography, astronomy, voting patterns, etc.

14 Likes

@MudassirAli Oh I see. For me I created inactive relationship between Status and Status History tables and used USERELATIONSHIP function. I am glad it worked out though. looking forward to your Matrix or another visual you will use.

1 Like

@BrianJ I would just say that your skill set is very very unique and the above analysis has motivated me more to go for machine learning / data analytics expertise.

This is just unique. Hats Off!

1 Like

@Neba,

Terrific entry! Attractive design and color, well-organized according to the requirements of the brief, wide range of techniques utilized and a very nice job addressing the Status History data, which IMO was the most difficult aspect of this challenge. I think the heat map visual you used to depict the duration of the individual process steps was particularly effective.

The progress in your entries from Challenge #4 to Challenge #6, from both a design and analytics perspective is quite remarkable and should be an inspiration to everyone looking to learn and improve through these challenges.

One very minor suggestion – on your front page, I think the look could be improved with the use of some graphics/icons to represent each section instead of just the titles.

Well done!

  • Brian
1 Like

Hey @BrianJ , Great explanation of your thought process. How is R to learn from scratch ? You are inspiring me to learn R or Python next after understanding DAX well off course.
Do I need math or statistics background to be able to understand it well ?
Every time I open your R scripts using R- Studio, it looks intimidating.

Thanks,
Nebiyu

1 Like

Thank you @BrianJ for your feedback, It means a lot. I am learning tremendously from other entries by you and the other masters of PBI.
Yes you are right the home page was an easy task compared to the others and it is the cover of the report so it could have been better.

I will improve it.

Regards,
Nebiyu

2 Likes

@Neba,

R is incredibly simple to learn. Literally, you can learn how to do the types of things I’ve been doing in my challenge entries in a couple of days. The R. Scripts may look intimidating, but they’re actually pretty simple- typically loading the libraries needed for whatever specialized analysis is the focus (in this case two Benford analysis packages), some simple data loading and handling routines, assignment of variables and then the statistical and plotting commands. The handling routines are pretty constant and the statistical commands you just can Google and fill in the blanks. The Challenges and forum work have put me behind on my videos, but I have one on beginning R for Power BI users that is almost done that I’ll try to finish up this weekend that shows how easy it actually is.

That being said, the key is having a good statistical foundation, since R makes it equally easy to run very good and very bad analyses. The are some great free training resources on line to build this necessary background, and if you’re interested I’d be glad to put together a list, similar to the one @Melissa did for M resources.

  • Brian
3 Likes

@BrianJ,

Thank you for your reply. Good to know it’s not that hard and I am glad that there is a R Guru among us. Right now though, I still get stuck in some of the easy DAX calculations and I need to be able to use DAX comfortably before I move to another language. I am sure there are other members who mastered DAX and are ready for R. So when ever your ready to post it, please let us know.

Thanks,
Nebiyu

@Neba,

Far from an R Guru. Just someone with a good statistical background, who’s also good at Googling…:laughing:

6 Likes

@BrianJ
:laughing: still, you know if there are no doctors or internet in a small town a dude who goes to the city and googles the symptoms and gives a diagnosis is considered as a Doctor in that town? so it’s close enough :laughing:

Thank God for google.

2 Likes

Thanks to @BrianJ, here is the link to the report:

1 Like