RegEx Workout 002 - Character classes and alternatives

Welcome to regex challange number 2.

Level of difficulty:
image

As before, we will use the regex101.com/ editor. Its description will find at the beginning of the first workout

Task 1.

Write an expression that covers the text “EnterpriseDNA” and “enterpriseDNA” in full but leaves insignificant characters for the other examples.

Expected answer:

Source:

EnterpriseDNA
enterpriseDNA
Enterprisedna
.EnterpriseDNA
EnterpriseDNa
EnterpriseDNA@

Task 2.

Below is a summary of interviews for a software company. Mark the entire line that refers to a potential DAX programmer who handled the recruitment task in less than 4h.

Expected answer.

Optional hard version: Adjust the expression so that the Match information section shows only the last name.

Expected answer.

Source:
DAX instructor candidate took 3h to complete the task:John K
SQL instructor candidate took 3h to complete the task:Amy B
DAX instructor candidate took 4h to complete the task:Paul K
R instructor candidate took 5h to complete the task:Bob W

Task 3.

Mark in full the sentences that mention R or SQL or Python

Expected answer:

Source:
Python is a language that an aspiring analyst should consider learning
Cobol is a language that an aspiring analyst should consider learning
R is a language that an aspiring analyst should consider learning
SQL is a language that an aspiring analyst should consider learning
C# is a language that an aspiring analyst should consider learning

Simply post your code and a screenshot of your results.

Please blur code or place it in a hidden section.

Answer to this workout will be published on May 28 2023

2 Likes

@KrzysztofNowak,

This is how I did it. I’m curious about building more efficient or “tighter” RegEx’s. But these work:

Task 1

/(?:E|e)nterpriseDNA/g

Task 2-a

/DAX.*[0123]h.*$/gm

Task 2-b

/DAX.*[0123]h.*:(.*)$/gm

image

Task 3

/^.*(?:R|SQL|Python).*$/gm

Thanks for putting this workout together.

1 Like

Task 1

Summary

[En]nterpriseDNA?

Task 2

Summary

DAX.* [123]h.:(\N)

Task 3

Summary

^[R|SQL|Python]\N*

Pic

Summary

Exercise realised in Python

task 1
import re

#task 1
string_task1 = """
EnterpriseDNA
enterpriseDNA
Enterprisedna
.EnterpriseDNA
EnterpriseDNa
EnterpriseDNA@
"""

regExp_task1 = "enterprisedna"

matches_task1 = re.findall(regExp_task1,string_task1,re.IGNORECASE)

print(matches_task1)
task 2
import re

#task2
string_task2 = """
DAX instructor candidate took 3h to complete the task:John K
SQL instructor candidate took 3h to complete the task:Amy B
DAX instructor candidate took 4h to complete the task:Paul K
R instructor candidate took 5h to complete the task:Bob W
"""

regExp_task2 = ".*[0-3]h.*:(.*)"

matches_task2 = re.findall(regExp_task2,string_task2,re.MULTILINE)

print("Candidates : ",', '.join(matches_task2))
task 3
import re

#task3
string_task3 = """
Python is a language that an aspiring analyst should consider learning
Cobol is a language that an aspiring analyst should consider learning
R is a language that an aspiring analyst should consider learning
SQL is a language that an aspiring analyst should consider learning
C# is a language that an aspiring analyst should consider learning
"""

regExp_task3 = r".*\bPython\b.*|.*\bSQL\b.*|.*\bR\b.*"

matches_task3 = re.findall(regExp_task3,string_task3,re.MULTILINE | re.IGNORECASE)

print ( "All matches : ",', '.join(matches_task3) )

Thanks for the challenge !

1 Like

@KrzysztofNowak ,

Thanks! Another great workout – I feel like I’m learning a ton every week in these…

Looking forward to #3!

  • Brian

Q1:

Summary

Q2a::

Summary

Q2b:

Summary

Q3:

Summary

Thanks All for participation. Below You will find my solutions.

Task 1 Solution

First version.

In this solution, we start with a class of characters that consists of a lowercase or uppercase letter E ([Ee]). It is followed by any character (.) and the information that it can occur an unlimited number of times. The expression ends with the requirement that the text ends with an uppercase letter ([[:upper:]])

In the second solution, we again use the same character class, but require it to be followed at least once (+) by a lowercase letter ([[:lower:]]) and then at least once (+) by an uppercase letter

Tast 2 solution:

In this expression, we start by defining the anchor (^), which tells us what the expression must start with. It can be followed repeatedly by any know, a number between 1 and 3, letters of any size ( at least once) and again any character.

Extra version:

In this expression, we start by defining an anchor (^), which tells us what the expression must start with. It can be followed repeatedly by any character, a number from 1 to 3, letters of any size (at least once) and again by any character. We end the expression with information about one occurrence of a special character ([[:punct:]]{1}) and close the whole thing in parentheses to specify this first group. The second group is at least one occurrence of a letter, a space, and a letter again.

Task 3 solution

In this expression, we define an alternative. The expression consists of the letter R and any characters or the text “SQL” and any characters or the text “Python” and any characters.

1 Like

Hello @HufferD , I like all your solutions. First one could be less dependent on direct match, but works for this challange of course.

Hello @borydobon , all good. In second solution You could also use range [1-3]. More comfortable if range is for example 1-39.

1 Like

Hello @BrianJ , Very good solutions. In the last one you probably identified more potential pitfalls than I did. The simpler solution works well for this task, but if, for example, the names of the languages were repeated at the end, it would be better to have additional tags like yours.

Great workout! Thank you Krzysztof

Task1

[e,E]nterpriseDN[A]?

Task2

^DAX.+[1-3].+
^DAX.+[1-3].+:(.+)

Task3

(?:R|SQL|Python).+

1 Like

Hello @alex-badiu , very well optimized solutions, thank You.

Hi @zwhite , Thank you for submission. Could you please check if solution for task 2 takes only DAX into consideration? In regex101.com I see also SQL marked. But groups are captured well.

Hi @KrzysztofNowak ,

Something new that I’m trying to learn :slight_smile:
Q1:

Q2:a

Q2:b

Q3:

Thanks
Keith