Regular Expressions

Regular Expressions (RegEx) in Python are a powerful way to search, match, and manipulate text patterns. They are useful for tasks like validating email addresses, finding patterns in text, or replacing certain parts of a string.

Python provides the re module to work with regular expressions.

What is a Regular Expression?

A regular expression is a sequence of characters that defines a search pattern. For example:

  • "cat" matches the exact word cat.
  • "ca." matches any word that starts with ca followed by any character.

Importing the re Module

To use regular expressions in Python, you must first import the re module.

Example:

import re

Common Functions in re

1. re.search()

Searches for a pattern in the text and returns the first match.

Example:

Python
Copy
import re

text = "Python is fun"
pattern = "Python"

match = re.search(pattern, text)
if match:
    print("Pattern found!")
else:
    print("Pattern not found.")

2. re.match()

Checks if the pattern matches at the beginning of the string.

Example:

Python
Copy
import re

text = "Python is fun"
pattern = "Python"

match = re.match(pattern, text)
if match:
    print("Pattern found at the beginning!")
else:
    print("Pattern not found at the beginning.")

3. re.findall()

Finds all occurrences of a pattern in the text.

Example:

Python
Copy
import re

text = "I love Python. Python is powerful."
pattern = "Python"

matches = re.findall(pattern, text)
print(matches)  # Output: ['Python', 'Python']

4. re.sub()

Replaces occurrences of a pattern with a new string.

Example:

Python
Copy
import re

text = "I love Python. Python is powerful."
pattern = "Python"

result = re.sub(pattern, "Java", text)
print(result)  # Output: I love Java. Java is powerful.

Special Characters in Regular Expressions

Character Description Example
. Matches any single character "c.t" → Matches cat, cot
^ Matches the start of a string "^Hello" → Matches Hello at the beginning
$ Matches the end of a string "world$" → Matches world at the end
* Matches 0 or more repetitions "ca*t" → Matches ct, cat, caaat
+ Matches 1 or more repetitions "ca+t" → Matches cat, caaat but not ct
? Matches 0 or 1 occurrence "ca?t" → Matches cat or ct
{} Matches a specific number of repetitions "ca{2}t" → Matches caat
[] Matches any character inside brackets "[aeiou]" → Matches vowels
| Logical OR "Hello|Hi" → Matches "Hello Python" or "Hi Python"
\ Escapes a special character "\\d" → Matches a digit

Predefined Character Classes

Class Description Example
\d Matches any digit (0-9) "\\d" → Matches 1, 2
\D Matches any non-digit "\\D" → Matches A, !
\w Matches any alphanumeric character or underscore "\\w" → Matches a, 1, _
\W Matches any non-alphanumeric "\\W" → Matches @,
\s Matches any whitespace "\\s" → Matches , \n
\S Matches any non-whitespace "\\S" → Matches a, 1

Combining Patterns

You can combine patterns to create more complex regular expressions.

Example:

Python
Copy
import re

text = "My phone number is 123-456-7890."
pattern = r"\d{3}-\d{3}-\d{4}"  # Matches a phone number format

match = re.search(pattern, text)
if match:
    print("Phone number found:", match.group())  # Output: 123-456-7890

Flags in Regular Expressions

Flags modify the behavior of a regex pattern.

Flag Description
re.IGNORECASE (re.I) Makes the pattern case-insensitive
re.MULTILINE (re.M) Matches across multiple lines
re.DOTALL (re.S) Makes . match newline characters

Example:

Python
Copy
import re

text = "Hello\nWorld"
pattern = r"Hello.World"

match = re.search(pattern, text, re.DOTALL)
if match:
    print("Match found!")