Note: if it is Your first challange, please see Regex #1 where regex101.com has been explained.
In this exercise, we will learn to use the substitution tool to restore order in the text.
Task 1
Imagine that you receive a list of invoices with a date columns. Most of the lines have the correct format, e.g. 2023-May-01, but there are exceptions in the form of May/01/2024
Just like below:
2023-May-01
May/01/2024
Jun/03/2021
Solution to this post will be posted on 11.06.2023
import re
string = """
Microsoft’s headquarters since 1980 is Redmont
1980 Microsoft’s headquarters since is Redmont
1999 Google’s headquarters since is Alabama
"""
pattern = r"(\d{4}\s)(\w+’s\sheadquarters\s)(since\s)(is [A-Z][a-z]+)"
new_text = re.sub(pattern, r"\2\3\1\4", string)
print(new_text)
Hello @zwhite , thank You for participating. It is lot of effort to assemble solutions it 2 tools, thank You. results are great, I like the fact that You limited direct matching to minimum.
Hello @borydobon , very good solutions, with minimum number of characters. Glad that you identified option to define You own separators in substitution panel
Thank You @BrianJ , very good solutions. Just like others You found that in substitution panel You can not only switch position of group but also add additional characters.
You all found the functionality of the substitution panel. Before I publish my solution, I will also draw your attention to the fact that groups can be named, which can sometimes increase the readability of the solution. Take a look at the following example: