Regular Expressions (RegEx) in Python
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern.
It is mainly used for matching, searching, and manipulating text.
With RegEx, you can:
• Validate input
• Search text patterns
• Extract specific data
• Replace parts of strings
RegEx Module in Python
Python provides a built-in module called re to work with regular
expressions.
Importing the re Module
import re
Once imported, you can use RegEx functions to search and manipulate strings.
Using RegEx in Python
Example: Check Pattern at Start and End of a String
The following example checks whether a sentence starts with "Hello" and ends with "World".
Output:
If a match is found, a Match object is returned; otherwise, None is returned.
RegEx Functions
The re module provides several useful functions:
| Function | Description |
|---|---|
findall() | Returns all matching patterns as a list |
search() | Returns the first match as a Match object |
split() | Splits a string based on a pattern |
sub() | Replaces matched patterns with new text |
Metacharacters
Metacharacters have special meanings in RegEx patterns.
| Symbol | Meaning | Example |
|---|---|---|
[] | Set of characters | [a-z] |
\ | Special sequence | \d |
. | Any character | c.t |
^ | Starts with | ^Hi |
$ | Ends with | end$ |
* | Zero or more | go* |
+ | One or more | go+ |
? | Zero or one | go? |
{} | Exact count | \d{3} |
| | Either or | a|b |
() | Grouping | (abc) |
Flags in Regular Expressions
Flags modify how RegEx patterns behave.
| Flag | Short | Description |
|---|---|---|
re.IGNORECASE | re.I | Case-insensitive matching |
re.MULTILINE | re.M | Match beginning of each line |
re.DOTALL | re.S | Dot matches newline |
re.ASCII | re.A | ASCII-only matching |
re.VERBOSE | re.X | Readable RegEx patterns |
Special Sequences
Special sequences start with a backslash \ and have specific meanings.
| Sequence | Description | Example |
|---|---|---|
\A | Start of string | \AHello |
\b | Word boundary | r"\bcat" |
\B | Not word boundary | r"\Bcat" |
\d | Digits (0–9) | \d+ |
\D | Non-digits | \D |
\s | Whitespace | \s |
\S | Non-whitespace | \S |
\w | Word characters | \w+ |
\W | Non-word characters | \W |
\Z | End of string | end\Z |
Sets in RegEx
Sets are defined using square brackets [].
| Set | Description |
|---|---|
[xyz] | Matches x, y, or z |
[a-z] | Lowercase letters |
[^abc] | Anything except a, b, c |
[0-9] | Digits |
[A-Za-z] | Upper and lowercase letters |
[+] | Matches literal + |
The findall() Function
Returns all matches as a list.
Example: Find All Occurrences
Output:
If no matches are found, an empty list is returned.
The search() Function
Returns the first match found in the string.
Example: Search for First Digit
Output:
If no match exists, None is returned.
The split() Function
Splits a string at each match.
Example: Split by Comma
Output:
Split with Limit
Output:
The sub() Function
Replaces matching patterns with new text.
Example: Replace Digits
Output:
Limit Replacements
Output:
Match Object & Methods
A Match object contains details about the match.
Example: Get Match Information
Output:
(17, 21)
Sun rises in the East
East
