Regular Expressions
Regular Expressions (RegEx) in Python are a powerful way to search, match, and manipulate text patterns. They are useful for tasks like validating email addresses, finding patterns in text, or replacing certain parts of a string.
Python provides the re module to work with regular expressions.
What is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. For example:
- "cat" matches the exact word cat.
- "ca." matches any word that starts with ca followed by any character.
Importing the re Module
To use regular expressions in Python, you must first import the re module.
Example:
import re
Common Functions in re
1. re.search()
Searches for a pattern in the text and returns the first match.
Example:
import re text = "Python is fun" pattern = "Python" match = re.search(pattern, text) if match: print("Pattern found!") else: print("Pattern not found.")
2. re.match()
Checks if the pattern matches at the beginning of the string.
Example:
import re text = "Python is fun" pattern = "Python" match = re.match(pattern, text) if match: print("Pattern found at the beginning!") else: print("Pattern not found at the beginning.")
3. re.findall()
Finds all occurrences of a pattern in the text.
Example:
import re text = "I love Python. Python is powerful." pattern = "Python" matches = re.findall(pattern, text) print(matches) # Output: ['Python', 'Python']
4. re.sub()
Replaces occurrences of a pattern with a new string.
Example:
import re text = "I love Python. Python is powerful." pattern = "Python" result = re.sub(pattern, "Java", text) print(result) # Output: I love Java. Java is powerful.
Special Characters in Regular Expressions
Character | Description | Example |
---|---|---|
. | Matches any single character | "c.t" → Matches cat, cot |
^ | Matches the start of a string | "^Hello" → Matches Hello at the beginning |
$ | Matches the end of a string | "world$" → Matches world at the end |
* | Matches 0 or more repetitions | "ca*t" → Matches ct, cat, caaat |
+ | Matches 1 or more repetitions | "ca+t" → Matches cat, caaat but not ct |
? | Matches 0 or 1 occurrence | "ca?t" → Matches cat or ct |
{} | Matches a specific number of repetitions | "ca{2}t" → Matches caat |
[] | Matches any character inside brackets | "[aeiou]" → Matches vowels |
| | Logical OR | "Hello|Hi" → Matches "Hello Python" or "Hi Python" |
\ | Escapes a special character | "\\d" → Matches a digit |
Predefined Character Classes
Class | Description | Example |
---|---|---|
\d | Matches any digit (0-9) | "\\d" → Matches 1, 2 |
\D | Matches any non-digit | "\\D" → Matches A, ! |
\w | Matches any alphanumeric character or underscore | "\\w" → Matches a, 1, _ |
\W | Matches any non-alphanumeric | "\\W" → Matches @, |
\s | Matches any whitespace | "\\s" → Matches , \n |
\S | Matches any non-whitespace | "\\S" → Matches a, 1 |
Combining Patterns
You can combine patterns to create more complex regular expressions.
Example:
import re text = "My phone number is 123-456-7890." pattern = r"\d{3}-\d{3}-\d{4}" # Matches a phone number format match = re.search(pattern, text) if match: print("Phone number found:", match.group()) # Output: 123-456-7890
Flags in Regular Expressions
Flags modify the behavior of a regex pattern.
Flag | Description |
---|---|
re.IGNORECASE (re.I) | Makes the pattern case-insensitive |
re.MULTILINE (re.M) | Matches across multiple lines |
re.DOTALL (re.S) | Makes . match newline characters |
Example:
import re text = "Hello\nWorld" pattern = r"Hello.World" match = re.search(pattern, text, re.DOTALL) if match: print("Match found!")