Regular Expressions (RegEx) in Python
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. It is mainly used for matching, searching, and manipulating text.
With RegEx, you can:
- Validate input
- Search text patterns
- Extract specific data
- Replace parts of strings
RegEx Module in Python
Python provides a built-in module called re to work with
regular expressions.
Importing the re Module:
import re
Once imported, you can use RegEx functions to search and manipulate strings.
Using RegEx in Python
Example: Check Pattern at Start and End of a String
The following example checks whether a sentence starts with "Hello" and ends with "World".
If a match is found, a Match object is returned; otherwise,
None is returned.
RegEx Functions
The re module provides several useful functions:
| Function | Description |
|---|---|
findall()
|
Returns all matching patterns as a list |
search()
|
Returns the first match as a Match object |
split()
|
Splits a string based on a pattern |
sub()
|
Replaces matched patterns with new text |
Metacharacters
Metacharacters have special meanings in RegEx patterns.
| Symbol | Meaning | Example |
|---|---|---|
[]
|
Set of characters | [a-z] |
\
|
Special sequence | \d |
.
|
Any character | c.t |
^
|
Starts with | ^Hi |
$
|
Ends with | end$ |
*
|
Zero or more | go* |
+
|
One or more | go+ |
?
|
Zero or one | go? |
{}
|
Exact count | \d{3} |
|
|
Either or | a|b |
()
|
Grouping | (abc) |
Flags in Regular Expressions
Flags modify how RegEx patterns behave.
| Flag | Short | Description |
|---|---|---|
re.IGNORECASE
|
re.I
|
Case-insensitive matching |
re.MULTILINE
|
re.M
|
Match beginning of each line |
re.DOTALL
|
re.S
|
Dot matches newline |
re.ASCII
|
re.A
|
ASCII-only matching |
re.VERBOSE
|
re.X
|
Readable RegEx patterns |
Special Sequences
Special sequences start with a backslash \ and have specific
meanings.
| Sequence | Description | Example |
|---|---|---|
\A
|
Start of string | \AHello |
\b
|
Word boundary | r"\bcat" |
\B
|
Not word boundary | r"\Bcat" |
\d
|
Digits (0–9) | \d+ |
\D
|
Non-digits | \D |
\s
|
Whitespace | \s |
\S
|
Non-whitespace | \S |
\w
|
Word characters | \w+ |
\W
|
Non-word characters | \W |
\Z
|
End of string | end\Z |
Sets in RegEx
Sets are defined using square brackets [].
| Set | Description |
|---|---|
[xyz]
|
Matches x, y, or z |
[a-z]
|
Lowercase letters |
[^abc]
|
Anything except a, b, c |
[0-9]
|
Digits |
[A-Za-z]
|
Upper and lowercase letters |
[+]
|
Matches literal + |
The findall() Function
Returns all matches as a list.
Example: Find All Occurrences
If no matches are found, an empty list is returned.
The search() Function
Returns the first match found in the string.
Example: Search for First Digit
If no match exists, None is returned.
The split() Function
Splits a string at each match.
Example: Split by Comma
Split with Limit
The sub() Function
Replaces matching patterns with new text.
Example: Replace Digits
Limit Replacements
Match Object & Methods
A Match object contains details about the match.
