Data accuracy statistic

pedroccamara · November 11, 2020, 8:18pm

Hey guys,
Not sure if this is the right place to ask this or if there’s any place here for it. Hope so.
Here’s the “problem”: I’ve done 7 measures, that each one counts the records that have errors in some columns, let’s say also 7 columns.
My question: for me to know the % of errors of my data i will divide all my total measures by the countrows of my table? I’m not sure. Maybe the sum of all my measures divided by countrows times 7 (columns)? What if some measures above have same columns for each? Do i subtract them in the total of columns? What is fair to do?
Guys, i’m sorry if this is not the right place for it, but i really don’t know where is it?
Can you help me?
Thanks a lot
Pedro

Greg · November 11, 2020, 10:43pm

Hi @pedroccamara. I guess its whatever you’re looking for … the total number of errors or the total number of records that have errors in them. If the former then sum of all “wrong” measures / number of records * 7 as you said, or if the latter then create a new measure that flags a record as soon as an error is detected, then divide by the number of records. Your decision / requirement.
Greg

EnterpriseDNA · November 12, 2020, 3:58am

Hi @pedroccamara, we aim to consistently improve the topics being posted on the forum to help you in getting a strong solution faster. While waiting for a response, here are some tips so you can get the most out of the forum and other Enterprise DNA resources.

Use the forum search to discover if your query has been asked before by another member.
When posting a topic with formula make sure that it is correctly formatted to preformatted text </>.
Use the proper category that best describes your topic
Provide as much context to a question as possible.
Include demo pbix file, images of the entire scenario you are dealing with, screenshot of the data model, details of how you want to visualize a result, and any other supporting links and details.

I also suggest that you check the forum guideline https://forum.enterprisedna.co/t/how-to-use-the-enterprise-dna-support-forum/3951. Not adhering to it may sometimes cause delay in getting an answer.

pedroccamara · November 12, 2020, 11:12am

Hey @Greg
Thank you for your ideas and i believe you gave me another, which i believe it would be more accurate. If i have 7 columns on my table, and i have a measure for each kind of error, the accuracy will be the number of errors divided by total rows times error columns. Would you agree on this? In my mind it makes lotta sense. But that is just in my mind…

Greg · November 12, 2020, 12:52pm

Hi @pedroccamara

So the standard consultant’s answer … it depends.

Taking your info as a starting point, lets say you have 7 columns and 10 records, for a total number of “fields” possible = 70; if records 1-5 have errors in all 7 columns and records 6-10 have errors in only 1 column, then such an error count would be 40 (5 * 7 + 5 * 1), and the error “rate” would be (rounded) 57% (40/70).

Again, it depends on whether the total number of errors or the total number of records with errors is of most value to you.

Hope it helps.

Greg

pedroccamara · November 12, 2020, 1:34pm

Hey @Greg
I totally agree with you. Awesome! Very good!
Thank you so much!