Data types text and numeric

Hi there,

I have data that is of the ordinal flavor where some fields have rows with numbers 1 - 5 representing user satisfaction levels as well as the text N/A for not applicable.

Is it better to treat these columns (fields) as text or number data types, and likewise for columns that have nominal numbers 1 and 0 representing yes or no? My instinct says they’re ordinal/nominal therefore text.

Which brings me to the next question of whether it is useful (in Power BI) to check for correlations between these types of ordinal variables (fields) or not? If it is useful, then perhaps N/A could be changed to a numeric value (maybe 0 or -1)?

I am considering using the Correlation Plot visual in Power BI. It requires numeric values. I welcome all suggestions but would like to also know the best practice.

Thank you!

@powerbideveloper,

This type of data is known as a Likert scale. I would treat these as numeric data, since it is valid to run correlations and certain other statistical tests on this data. When I’ve done this in the past, I’ve coded N/A as -9999, just so that if I accidentally run any descriptive statistics on the data without first filtering out the N/As, it will give me an obviously crazy result as a reminder.

Below is an excellent article discussing the use of parametric and nonparametric statistical tests on Likert scale data for the non-statistician:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3886444/

I hope this is helpful to you.

  • Brian
1 Like

@BrianJ makes an exellent point.

In addition if you change the data type of a column containing both numeric values and text values into “number” all text values will return errors, so you’ll need to replace the text values before you do so.

“Yes / No” or “True / False” is of type logical. When you apply a mathematical operation on this data type True will result in 1 and False in 0. If you perform the same operation on “1” and “0” of type text, automatic type conversion will also be applied.

If you want to store 0 and 1 in type text because you don’t want the values to be summarized you can prevent this by changing the summarization setting in Data view for the column into “Don’t summarize”.

And finally from a cardinality, file size or performance perspective it really doesn’t matter there are only 2 distinct values to store in the column whether that’s Y/N, TRUE/FALSE or 1/0 …

1 Like