Find multiple text entries in a column

Hi all,
I don’t have a model, but the question is simple:

I have a table with a text column;
The text is not inserted properly which causes me to have to identify if there is any text pattern that could be useful to me, for example:

Imagine that the column is referring to an address, but the address is incorrectly filled in and there are cases with the city name included in the address.
I want to identify if the text has any cities, based on a list of cities.

Example:
13th Street. 47 W 13th St-NewYork/100sa
102 S OXFORD AVE APT 208LOSANGELES

I want to identify that on the first line there is ‘New York’.
and on the second line there is ‘Los Angeles’.

I initially thought of using Splitter.SplitTextByAnyDelimiter to inject the list of cities as a splitter, but what it will do is, if it finds the city, it will remove it from the text string.

Any insight will be greatly appreciated.
Thanks

Hi @JoaoMonteiro. I’ve done this using DAX before, but it sounds like you want it in M. Is either OK, or do you need it in M? If DAX, I’ll check my archive and draft something later this morning.
Greg

Hi Greg,
Thank you for your quick response.
I need to solve this problem in Power Query, but if you want to share with me your solution in DAX, I would be very grateful. After all it is always a way to learn.

Thanks
JM

Hi Joao,

for cases like New York and Los Angeles, the best way would be to duplicate these entries and search for them also without the whitespace.

Here is how you find if an element from a list of filter criteria is included in your column with addresses:

 =Table.SelectRows(prevStep, each List.Count(Splitter.SplitTextByAnyDelimiter(FilterCriteria)([AddressColumn]))>1)

It’s case sensitive, so if you need case insensitive search, you need to convert both lists (with text and search terms) to lower case.

Regards,
Matthias

Hi @JoaoMonteiro.

Here’s a quick-and-dirty DAX solution that may be of interest. I’ve used a slightly-modified version of the eDNA Practice Dataset as a demo.

The solution uses the CONTAINSSTRING DAX function

Match 1 = 
VAR _SearchTerm = 
    CALCULATE( MAX( Keywords[Keyword] ), Keywords[ID] = 1 )

VAR _KeywordTable = 
    FILTER( Customers, CONTAINSSTRING( Customers[Customer], _SearchTerm ) )

VAR _Result = 
    IF( COUNTROWS( _KeywordTable ) >= 1, 1, BLANK() )

RETURN
_Result

Hope this helps.
Greg

eDNA Forum - Find Multiple Text Entries in a Column.pbix (393.4 KB)

Hi Matthias,
I don’t just want to know if they exist, i.e. count() > 1.
If it exists, I want to get the value.
I believe I have to combine List.Count(SplitterTextByAnyDelimiter() ) with List.Selected, but I don’t know how to do it.

Regards
JM

Hi Greg,

Thank you so much for your help.
In the meantime I found the solution in Power Query, which I will share for future cases.

Regards
JM

In the meantime I’ve found one solution.

Starting Point:

  • I have an initial list of several cities;
  • I want to check in a column of a table, row by row, if any of the cities in the list are in the row.

The line where I want to do the check is not correctly formatted and situations like the ones I expose can happen:
20 Cooper Square/NewYork , NY 10003, USA
20 Cooper Square =0as/New YorkASD NY 10003, USA
20 Cooper Square–New York### , NY 10003, USA
201 CHEAHA XINGOXFORD AL 36203-9048 USA
201 CHEAHA XING &&OXFORD=AL== 36203-9048 USA

The solution I’ve found

Text.Combine(
List.Select(
valid_IC, – this is the list with all the cites
(x) =>
Text.Contains([Adress], x, Comparer.OrdinalIgnoreCase) — [Address] is the scanned column by
content of ‘valid_IC’ list
),",") — this is used to concatenate if more than one condition (City) is found.

Regards,
JM