Complicated data

BrianJ · November 24, 2019, 12:22am

Great question. I’d love to hear the techniques that others on the forum use but here are a few ideas to start the discussion:

Turn on column distribution, column quality and:column profile in Power Query. Provides great information on the empty values, errors, potential outliers, etc.

Running VALUES or DISTINCT functions on columns can help identify inconsistent data entry problems
Lots of techniques for identifying outliers, some of which may be legitimate values while others may be data entry errors

If you use R, people have written a lot of scripts to detect different types of dirty data. With R installed, you can now run these scripts as steps in PQ:

Eager to hear what techniques and tools others are using.

-Brian