In Python (programming language), regular expressions (or regex) are used to search, match, and manipulate text based on patterns. They are extremely powerful for text processing, validation, and parsing.
Python provides the re module for working with regular expressions.
1οΈβ£ Importing re Module
import re
2οΈβ£ Basic Functions
re.search()
Searches for a pattern anywhere in the string. Returns a match object if found.
import re
text = "My phone number is 9876543210"
match = re.search(r"\d{10}", text)
if match:
print("Phone number found:", match.group())
Output
Phone number found: 9876543210
re.match()
Checks for a pattern at the beginning of the string.
text = "Python is fun"
match = re.match(r"Python", text)
if match:
print("Match found:", match.group())
Output
Match found: Python
re.findall()
Finds all occurrences of a pattern and returns a list.
text = "My numbers are 123 and 456"
numbers = re.findall(r"\d+", text)
print(numbers)
Output
['123', '456']
re.sub()
Replaces matched patterns with new text.
text = "I like Java"
new_text = re.sub(r"Java", "Python", text)
print(new_text)
Output
I like Python
3οΈβ£ Common Regex Patterns
| Pattern | Description | Example |
|---|---|---|
\d | Digit (0-9) | \d{3} matches 3 digits |
\D | Non-digit | \D+ matches letters or symbols |
\w | Alphanumeric | \w+ matches words |
\W | Non-alphanumeric | \W+ matches symbols |
\s | Whitespace | \s+ matches spaces, tabs |
\S | Non-whitespace | \S+ matches non-space characters |
. | Any character except newline | a.b matches a-b or aab |
^ | Start of string | ^Hello |
$ | End of string | world$ |
* | 0 or more | a* |
+ | 1 or more | a+ |
? | 0 or 1 | a? |
{n} | Exact n repetitions | \d{4} matches 4 digits |
{n,m} | n to m repetitions | \d{2,4} |
4οΈβ£ Example: Email Validation
import re
email = "example@gmail.com"
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
if re.match(pattern, email):
print("Valid email")
else:
print("Invalid email")
Output
Valid email
5οΈβ£ Example: Extract All Words
text = "Python, Java, C++"
words = re.findall(r"\w+", text)
print(words)
Output
['Python', 'Java', 'C']
β Summary
| Function | Purpose |
|---|---|
re.search() | Search pattern anywhere |
re.match() | Match pattern at start |
re.findall() | Find all matches |
re.sub() | Replace matched patterns |
β Key Points
- Use raw strings
r"pattern"to avoid escaping backslashes - Regex is powerful for validation, searching, splitting, and replacing
- Learn common symbols like
\d,\w,+,*,^,$