Looking for inspiration / ideas

jgriffit · August 4, 2020, 9:36am

I am playing with a single table data file, and looking for some inspiration for new ways to look at the data.

I have made a fairly good start (I think!) but am looking for some ideas on how I might slice and dice this dataset.

I am thinking top 5 error ID’s over time … and by Customer, by Data Entry officer. There’s only a few dimensions, but lots of data (customers, employees, errors), so I am struggling with what might be useful other the TOP N.

But with a reasonable size dataset like this, some segmenting (groups) might be useful.
Any feedback to inspire me would be appreciated.

It’s quite a fun dataset.

Thanks in advance
John

BrianJ · August 4, 2020, 2:10pm

@jgriffit,

This does look interesting, but I’m wondering if you can provide us a little bit more context as to what we are looking at? What type of industry/products are these, what role do the employee and/or the customer have in the generation of errors, what types of errors are we talking about, what are the errors that you’ve singled out as critical in your IN statement?

I recognize some of this information might be sensitive, but whatever additional context you could provide would be helpful in formulating ideas within an analysis plan.

It looks like you’re off to a strong start here, and I do really like your approach of using the forum to generate additional ideas that you might have overlooked (we all have blindspots, and sometimes the closer you are to the data the harder it is to see alternative approaches/perspectives). I haven’t seen people use the forum in this particular way before, but I think it’s pretty cool.

Brian

jgriffit · August 5, 2020, 1:43am

Hi @BrianJ,
Thanks for your response.

I have renamed columns to make the scenario a little more useful. As you note, I have have made anon and generic, so had to think about scenario that would make sense to an analyst.

The scenario goes,

We have data entry operators that enter records. Each record is assessed by the database for invalid data, and if there are any data entry errors, an error ID is produced that identifies the specific error. For example, missing post code may be error 55. Missing surname may be error 88. So there could be many errors for a single entry.

Each data entry operator can enter data for one or more customers.

Each customer can have one or more data entry operators.

Each month a report is exported that shows total entries for a customer, the error types made by a data entry operator.

The 5 error ID’s I chose were just the top ones by volume, and I was thinking of if we wanted to say a specific number of error ID’s (say 5 error ID’s) were “more important” than others, we could segment them out and track those over time. I have no specific ID’s at the moment that are considered ‘more important’ than others, but it was a type of analysis I was think of, so thought of perhaps using the IN operator to do that, and track as a group total, and individually within that group.

So I suppose I am looking for some ideas around aligning the DAX functions with the analysis. I get a bit stressed about the path to take sometimes at the beginning of a dataset. I will plug away at different approaches, using Summarise, TopN, RankX, Custom groups like how you do for top profit customers, and whatever pops into mind, but any ideas that come from the forum would be well received.

So much for a ‘simple dataset’. The options sometimes flummox me, but I suppose it is just a matter of starting small each time then ‘build it out’ and not to get caught up too much in the holistic picture right at the start.

Updated pbix here: play error file.pbix (60.3 KB)

BrianJ · August 5, 2020, 4:13am

@jgriffit,

Thanks very much for the additional background- that’s really helpful. I’ll chew on this a bit and get back to you tomorrow with some additional ideas, and hopefully others will do the same.

One thing that pops to mind immediately is Sams “cross selling matrix” analysis that could be used to identify which errors are most frequently made or identified in concert with each other.

I would also be interested to see how consistent the different operators are in terms of identifying the different types of errors- could be either a source of bias or a potential for specialization if significant differences across operators is revealed.

You’re right - this is a rich and interesting dataset. I’ll keep percolating on it…

Brian

BrianJ · August 5, 2020, 7:40am

@haroonali1000, @sam.mckay,

@jgriffit‘s post brings up a really interesting idea - could members submit actual datasets and scenarios as candidates for future Data Challenges? Seems like a win-win in that the member gets the benefit of dozens of top analysts working on their data, and you get a real-world problem and actual data (as opposed to having to generate it randomly), plus we all get the ability to find out how things turned out down the road, after the best ideas generated by the Challenge perhaps actually get implemented.

What do you think?

Brian

haroonali1000 · August 5, 2020, 8:35am

Hi @BrianJ, great idea something me and Sam are looking to implement. I will create a post under the challenge category so that members can submit their requests.

Kind Regards,
Haroon

BrianJ · August 5, 2020, 10:50am

@haroonali1000,

That sounds great - thanks!

This really will be an incredible opportunity for members. I did some back of the envelope calculations, and having your dataset analyzed in a challenge is roughly the the equivalent of getting $60,000-$80,000 in consulting hours for free.

Pretty nice return on investment for a $500 membership…

Brian