Cleaning non-structured text in PowerQuery

Hi,
I want to cleanup an Address column in which the addrss is mixed up with the VAT. I dont expect a code solution but rather if there is some way to do it with the PQ through some function. I have tried to “teach” through “New column from example” but it doesn´t seem to understand. The step 1 (and already a good win would be to move the VAT number) thanks
david

What about using Split Column by Delimiter, and use “NIF:” as the delimiter?
This will give you two columns named “Address” with a .1 or .2 after the name. The .2 column will be just your NIF number (the characters NIF: will not appear in the column), and the .1 column will be just your address.

NOTE: this solution assumes that NOTHING appears after the NIF number

1 Like

Thanks but NIF is not always at the end, sometimes is at the beginning without even saying “NIF” (see last case on the image as an example), although is a good solucion for most.
The only rule is that it is a ONE LETTER PLUS 8 NUMBERS at the beginning or at the end.
I can edit manually the outliers to make your solution work but now I am curious if there is some way to do it :slight_smile:
Thanks!

Ah, sorry about that, I assumed that the final option in the screenshot just had that line cut off from view.
That does make for a tricky issue, so to be clear, we actually are checking for any of the following possibilities:

  1. Items with “NIF:” before them, appearing at the end of the data
  2. Items appearing at the beginning of the data, possibly with “NIF:”, possibly not
  3. In either case, it will be one letter and eight numbers - total of 9 characters

Any other circumstances we need to watch for?
I feel that item #3 is the key here, if we can somehow search for that occurrence (hopefully you would not have the same thing happen in a valid address line also)

Exactly, sounds the best to me. And alost impossible that an address contains that.
In row 4 you can see that it is ES+1LETTER+8NUMBERS … ES can be ignored completely

See if this post can help you work towards a solution.

Should you need further assistance please provide a small sample file. Thanks.
I hope this is helpful.

1 Like

@davidsingular This one was intriguing, so I did some research and found the following
blog post.

And from that, was able to come up with the attached that should be able to work with your data (except for column names of course)
Locate TextPattern in Power Querry.pbix (27.6 KB)

3 Likes

Was a tricky one I guess! Thanks!!

Yes David - this is very tricky one, there is no single function in PQ to fix this. as mentioned by @Melissa - you have to use various techiques to get the desired result.

Glad I could help - and it gave me a new ‘pattern’ to add to my bag of tricks. I’m sure I"ll come up with a need for it somewhere down the line in my own reporting. :smile: