Regular expressions (also known as regexes, regexes pattern, Res) are small and specialized programming languages, which are embedded in Python. The “re” module is used to use Regular expressions. With the use of these expressions, we can specify the types of strings which can contain general English sentences such as email addresses.
These python 3 regex are compiled to bytecodes which are present in the series. Then, an engine written in C is used to execute them. Hence, we will begin this guide regular expression in python for beginners with operations. While learning to test your code you can use python regex tester.
If you want to Enrich your career with a Python-certified professional, then visit MindMajix - A Global online training platform: “Python Training Course". This course will help you to achieve excellence in this domain. |
RE module in Python provides regular expression operations. Python 3 Regex use backslash ( ) character just to indicate any special characters without specifying their any special meaning. This falls in conflict with Python’s usage of backslash which it uses to indicate string literals.
If you were to use a literal backslash you will have to write ‘ ‘ , because there the regular expression must be indicated through ‘ ‘ and each backslash one will use, will have to specify it as ‘ ‘ within the Python string literal.
So, the raw string notations of Python are used to avoid this, if backslashes are prefixed with ‘r’. Most regular expressions are available as RegexObject methods and module-level functions. Be sure to use Python regex tester for more help.
[ Learn Complete Python Tutorial For Beginners ]
The functions in the Regular expressions (Or RE) module, allow us to test if any particular string matches any regular expression. The catenation of strings is possible using Regular Expressions.
Here, it can be explained using this example:
Consider, ‘P’ and ‘Q’ both are regular expressions; the catenation of these expressions will give us ‘PQ’. If ‘a’ matches ‘P’ and ‘b’ matches ‘Q’ then, ‘ab’ will match ‘PQ’.
Regular expressions contain both special and ordinary characters. Some of the simplest regular expressions are ordinary characters, ‘A’, ‘a’ etcetera. Since these ordinary characters as shown in the above example can be concatenated hence, the word matches the string ‘word’.
Some characters are special for example, ‘(‘ or ‘|’. They regulate the way in which the regular expressions around the ordinary characters are interpreted or define their classes. Some of the regular expressions are ‘.’, ‘^’, ‘$’, ‘*’, ‘+’, ‘?’ etc.
If either zero or more characters, at the beginning of the string match the regular expressions then this module returns us MatchObject instance. If nothing in string matches then the ‘None’ is returned by the module.
re.match() only matches the beginning of the string, even in the MULTILINE mode in spite of checking the beginning of each line.
[ Related Article: Defining Functions in Python ]
re.search function scans the string. It produces a MatchObject instance when it encounters the first instance where the regular expression produces a match. It returns none if there is no matching position for the corresponding regular expression.
There is the difference in the handling span of data between ‘re.match’ and ‘re.search’.
‘re.match’ scans only at the beginning of the string for matching the regular expressions. If it detect similarity in the pattern of the beginning of the string and regular expression, it executes.
In contrast ‘re.search’ scans throughout the string for the match in the string and regular expressions.
Here it will become clearer with this example:
import re
a = "123abc"
t = re.match("[a-z]+",a)
y = re.search("[a-z]+",a)
print (t)
print (y)
In this code, we have assigned string ‘123abc’ to the variable ‘a’. Then we have used both re.match and re.search. Both regular expressions will then look for matching alphabets (a-z).
Now, here is the output of the code.
None
We have used re.match first; it will look for alphabets at just the beginning. As expected it gave the result none, since our string “123abc, only has numbers at the beginning and it is only looking for the alphabets. Hence we get the result as ‘None’.
In case of the re.search, it will look for the alphabets all through the string “123abc”. Since it founds the alphabets at the fourth place, it executes and shows the description in the result.
As re.match has to scan only the beginning, it is much quicker than the re.search which scans the whole string.
[ Learn How to Install Python on Windows and Linux ]
Python allows us to replace any string with other. Python regex replace object written as replace ( ) method, which is a part of the string module can be is used for this purpose.
Here is an example to describe the syntax of the search and then using python regex replace.
string.replace(s, old, new[, maxreplace])
These are the function parameters:
Now, we will understand this with the help of an example:
our_str = 'Spider man'
new_str = our_str.replace('Spider', 'Bat')
print(new_str)
new_str = our_str.replace('man', 'Sense')
print(new_str)
In this example, we have a string ‘Spider man’. Now we are replacing ‘Spider’ with ‘Bat’, and the next string we are replacing the ‘man’ with ‘Sense’. Here since we have not described the attribute ‘maxreplace’, then it would take it as ‘1’ by default. Now we will see the output:
Bat man
Spider Sense
Now, as we required, the replacements have taken place.
The regular expression objects in python are as follows:
The two objects ‘re.search’ and ‘re.match’ are already being discussed.
1. Split:
We can also break strings into further smaller strings in Python. It is done with the use of the Split function. You can use the comma to separate those chunks however if you will not specify any commas then white spaces will be taken as the breaks by default.
#part1
x = 'wind,water,fire'
k = x.split(",")
print (k)
#part2
a,b,c = x.split(",")
print (a,b,c)
Here, in this code, we have assigned string ‘wind,water.fire’ to the variable ‘x’. Then using the split function we will split three words in there as indicated by commas. Then in the second part, we have assigned the all three separated strings to the three variables ‘a’,’b’ and ‘c’.
Now, we will see the output of the following program.
['wind', 'water', 'fire']
wind water fire
First, we have got the separated sub-strings of our string and then we have extracted them after assigning them with the variable.
[ Check out Operators in Python ]
2. Findall:
Find all help in finding all the occurrences of any pattern in a string. Unlike re.search and re.match, ‘findall’ does not return Match object.
import re
text = """
1. Star Wars
2. Star Trek
3. Futurama
"""
S = re.findall(r'^(d+).(.*)$', text, re.MULTILINE)
print (S)
As you can see that because of the use of parentheses, we have got the list of tuples.
3. Compile:
Using compile ( ) function the pattern can be compiled into the pattern objects. Pattern matching or string substitutions can be performed with the help of the re.compile function. Here is a python re.compile example.
import re
name_check = re.compile(r"[^A-Za-zs.]")
name = input("Please, enter your name: ")
while name_check.search(name):
print ("Please enter your name correctly!")
The following instances are supported by the MatchObject instances. If you ever feel overwhelmed to learn these objects, you can use python regex cheat sheet to memorize them:
(m.start(group), m.end(group)).
Learn Top Python Interview Questions and Answers that help you grab high-paying jobs |
For modifying various aspects of matching, the Regular Expression Modifier includes optional modifiers that can do this task. These modifiers are used as the optional flags. Multiple modifiers can be provided using the exclusive OR (|). Here are some modifiers and their descriptions.
re
.
I
It performs matching while staying case-sensitive.re.L
Using these modifiers words are interpreted as per the current locale. The alphabetic group along with word boundary behavior are affected by this.re.M
With this modifier ‘$’ can be used to mark the end of any line, apart from an end of the string and ‘^’ can be used to mark the start of any line, apart from the start of the string.re.S
It makes the dot (period) match any character including a newline.re
.U
This modifier interprets letters according to the Unicode character set.re.X
This modifier ignores the whitespace, except inside ‘[]’ and when escaped by the backslash. The unescaped ‘#’ is used as the comment marker.Leaving control characters (? + . * ^ $ ( ) [ ] { } | ) all characters can match themselves. These characters can be escaped by the use of the backslash. You can use python regex cheat sheets if you want to know more functions. This is the list of some patterns and their descriptions in Python.
^
This matches the beginning of the line.$
This matches the end of the line.[…]
Any single character can be matched that is in brackets.[^…]
It matches any single character that is not present in the brackets.re*
It matches 0 or more than 0 occurrences of the expression preceding this pattern.re+
It matches 1 or more occurrences of the expression preceding this re-pattern.re?
It matches either 0 or 1 occurrences of the expression preceding this pattern.(?#...)
Comment.w
it matches the word characters.W
it matches the non-word characters.[ Check out How to Generate Random Numbers in Python ]
Literal characters are described with the use of double quotes (“"). For example, if we were to describe the literal string python, then it will be described as the Match “python”.
Character classes define the instructions for handling any expression. Here, are some of the character classes described below.
Special Character class | Description |
. | Match any character except new line |
d | Match any digit [0-9] |
D | Match anything except digit [^0-9] |
s | Match any whitespace character [t r n f] |
S | Match any non-whitespace character [^t r n f] |
w | Match any single word character |
W | Match any nonword character. |
Here are the repetition cases, which are used when we have to handle the repletion in strings.
Repetition Cases | Description |
run? | Match either “ru” or “run”. Here ‘n’ is optional. |
run* | Match ‘ru’ along with zero or more n’s. |
run+ | Match ‘ru’ along with 1 or more 1’s. |
d{4} | Match exactly 4 digits. |
d{4,} | Match 4 or more digits. |
There are two repetitions available in Python.
Greedy repetition tries to search for as many as repetitions as possible. Here is an example of the output.
Code:
import re
p = 'runnn'
greedy_re = 'n+'
mymatch = re.search(greedy_re, p)
t = mymatch.group()
print (t)
Output:
nnn
As you can see that the repetition of ‘n’ which is three times in ‘runnn’. We have got the required result.
The Nongreedy repetition is not greedy. That means it is satisfied with the first repetition it encounters.
import re
p = 'runnn'
non_greedy_re = 'n+?'
mymatch = re.search(non_greedy_re, p)
t = mymatch.group()
print (t)
Output:
n
As required the nongreedy repetition stopped at the first one it encountered.
[ Related Article: Generators & Iterators in Python ]
Anchors in Python determine where the match function has to be performed on the string.
Here are the few of them listed below:
Anchors | Description |
play | Matches “play” at the beginning of the interline line or string. |
play$ | Matches “play” at the end of the line or string. |
Aplay | Matches “play “at the start of the string. |
playZ | Matches “play” at the end of the string. |
bRunb | Matches “Run” at the word boundary. |
These are some special syntax which are used with parenthesis and have special meaning.
Example | Description |
E(?#Comment) | Matches “E”, the rest is a comment. |
E(?i)xample | Case sensitive while matching “xample”. |
E(?i:xample) | Again case sensitive while matching “xample”. |
Exampl(?:e|er)) | Group only without creating 1 backreference. |
If you are interested to learn Python and to become an Python Expert? Then check out our Python Certification Training Course at your near Cities.
Python Course Chennai, Python Course Bangalore, Python Course in Dallas, Python Course in New York
These courses are incorporated with Live instructor-led training, Industry Use cases, and hands-on live projects. This training program will make you an expert in Python and help you to achieve your dream job.
Explore Python Sample Resumes! Download & Edit, Get Noticed by Top Employers! |
Name | Dates | |
---|---|---|
Python Training | Nov 09 to Nov 24 | View Details |
Python Training | Nov 12 to Nov 27 | View Details |
Python Training | Nov 16 to Dec 01 | View Details |
Python Training | Nov 19 to Dec 04 | View Details |
Anjaneyulu Naini is working as a Content contributor for Mindmajix. He has a great understanding of today’s technology and statistical analysis environment, which includes key aspects such as analysis of variance and software,. He is well aware of various technologies such as Python, Artificial Intelligence, Oracle, Business Intelligence, Altrex, etc. Connect with him on LinkedIn and Twitter.