RegEx Workout 04 - SUBSTITUTION

Note: if it is Your first challange, please see Regex #1 where regex101.com has been explained.

In this exercise, we will learn to use the substitution tool to restore order in the text.

Task 1
Imagine that you receive a list of invoices with a date columns. Most of the lines have the correct format, e.g. 2023-May-01, but there are exceptions in the form of May/01/2024

Just like below:

2023-May-01
May/01/2024
Jun/03/2021

Solution to this post will be posted on 11.06.2023

To solve this problem, you will need Substitution panel.

image

Task 2.

In the second example, there is a mixed-up sentence formation. Use the substitution panel to restore the order seen in the first line.

Microsoft’s headquarters since 1980 is Redmont
1980 Microsoft’s headquarters since is Redmont
1999 Google’s headquarters since is Alabama

2 Likes

Hello @KrzysztofNowak ,

I used regex101 and Python to answer the question as the syntax isn’t exactly the same.

task 1
in regex 101
Regex pattern

(\w+)/(\d{2})/(\d{4})

Substitute pattern

$3-$1-$2

Image

In Python

#task 1
import re

string = """
2023-May-01
May/01/2024
Jun/03/2021
"""

pattern = r"(\w+)/(\d{2})/(\d{4})"

new_text = re.sub(pattern,r"\3-\1-\2",string)

print(new_text)

task 2
in regex 101
Regex pattern

(\d{4} )(\w+’s\sheadquarters )(since )(is [A-Z][a-z]+)

Substitute pattern

$2$3$1$4

Image

In Python

import re

string = """
Microsoft’s headquarters since 1980 is Redmont
1980 Microsoft’s headquarters since is Redmont
1999 Google’s headquarters since is Alabama
"""
pattern = r"(\d{4}\s)(\w+’s\sheadquarters\s)(since\s)(is [A-Z][a-z]+)"

new_text = re.sub(pattern, r"\2\3\1\4", string)

print(new_text)

Thanks

2 Likes
Task1

(\w{3}).(\d+).(\d{4})

Task2

(\d{4} )(.* )(is.*)

2 Likes

Hi @KrzysztofNowak,

Here is my solution to this workout:

Task 1:

Task 2:

Thanks for the workout.
Keith

1 Like

@KrzysztofNowak ,

Another great one - learned some fun new tricks in this one.

Q1 Solution

Q2 Solution

1 Like

Hello @zwhite , thank You for participating. It is lot of effort to assemble solutions it 2 tools, thank You. results are great, I like the fact that You limited direct matching to minimum.

Hello @borydobon , very good solutions, with minimum number of characters. Glad that you identified option to define You own separators in substitution panel

1 Like

Thank You @BrianJ , very good solutions. Just like others You found that in substitution panel You can not only switch position of group but also add additional characters.

Thank you for participation. Great solutions.

You all found the functionality of the substitution panel. Before I publish my solution, I will also draw your attention to the fact that groups can be named, which can sometimes increase the readability of the solution. Take a look at the following example:

My solutions

Task 1.

I decided to refer to the characters that identify words (\w), numbers (\d’) and punctuation ([[:punct:]])

Task 2