Python Regex Cheatsheet
Overview
The re module is one of the in buit python module for regular expression. it include search, match, findall, finditer, sub, and compile.
Core Regex APIs
import re
# Find first match anywhere
re.search(pattern, string)
# Match only at start
re.match(pattern, string)
# Return all matches
re.findall(pattern, string)
# Iterator of match objects
re.finditer(pattern, string)
# Replace matches
re.sub(pattern, repl, string)
# Precompile pattern for reuse
pattern = re.compile('pattern')Pattern Matching Basics
- Literal characters match themselves exactly
- Case-sensitive by default (use re.IGNORECASE for ignoring case)
- Some characters have special meaning and need escaping
- Raw strings (r'pattern') are preferred to avoid double escaping
Literal Characters Example
re.search('cat', 'concatenate') # matches 'cat'Special Characters and Escaping
| Character | Meaning | Literal Match |
|---|---|---|
| . | any character | \. |
| * | 0+ repeats | \* |
| + | 1+ repeats | \+ |
| \ | escape | \\ |
Common Pitfall
Interview Question
Metacharacters Overview
- Dot (.) matches any character except newline by default
- Caret (^) anchors the start of string or line
- Dollar ($) anchors the end of string or line
- Pipe (|) acts as OR, low precedence; use parentheses for grouping
Dot (.) Examples
re.search(r'a.b', 'acb') # matches
re.search(r'a.b', 'a\nb', re.S) # DOTALL matches newline
re.search(r'a.b', 'a\nb') # fails by default
Caret (^) Examples
re.search(r'^Hello', 'Hello world')
text = 'Hi\nHello'
re.search(r'^Hello', text, re.M)
re.search(r'^Hello', 'Say Hello') # fails
Dollar ($) Examples
re.search(r'world$', 'Hello world')
re.search(r'world$', 'world\nhello', re.M)Alternation (|) Examples
re.search(r'cat|dog', 'I like dogs')
re.search(r'caterpillar|cat', 'caterpillar')
re.search(r'(cat|dog)s?', 'dogs')
re.search(r'cat|caterpillar', 'caterpillar') # matches 'cat', wrong order
Interview Lightning Round (Very Common)
| Question | Expected Answer |
|---|---|
| Difference between re.match and re.search | match anchors at start, search scans entire string |
| Why raw strings? | Avoid Python escape handling |
| Does . match newline? | No (unless DOTALL) |
| Are regex case-sensitive? | Yes, by default |
| What do ^ and $ do? | Anchor start/end |
Best Practices (Interview-Grade)
- Always use raw strings
- Prefer explicit anchors (^, $)
- Use re.compile() for reuse
- Be mindful of flag interactions (re.M, re.S)
- Order alternations from longest to shortest
Regex: Character Classes & Sets
Overview
The Character classes help us to match one character from a group of possible characters. Simple sets, ranges, predefined classes, and negated sets are part of character classes
Basic Character Classes
- [abc] — matches one character that is a, b, or c
- [a-z], [A-Z], [0-9] — matches characters in the specified range
- [a-zA-Z0-9_] — matches letters, digits, and underscore (similar to \w)
- [^abc] — matches any character except a, b, or c
Character Classes Examples
re.findall(r'[abc]', 'cat') # ['c', 'a']
re.findall(r'[0-9]', 'Age: 25') # ['2', '5']
re.findall(r'[^abc]', 'abcXYZ') # ['X', 'Y', 'Z']Predefined Character Classes
- \d → digit (0–9), \D → non-digit
- \w → letters, digits, underscore ([a-zA-Z0-9_]), \W → non-word characters
- \s → whitespace (space, tab, newline), \S → non-whitespace
- Unicode vs ASCII: By default Python regex is Unicode-aware; re.ASCII restricts to ASCII-only
Predefined Character Classes Examples
re.findall(r'\d', 'Room 42') # ['4', '2']
re.findall(r'\w', 'Hi!') # ['H', 'i']
re.findall(r'\s', 'A B\nC') # [' ', '\n']
re.findall(r'\w', 'café', re.ASCII) # ['c', 'a', 'f']Character Class Edge Cases
- Hyphen (-) at start or end → literal hyphen, in the middle → range indicator
- Escaping needed inside [ ]: ], \, ^ (if first), - (if used as range)
- . matches any character except newline
- [\s\S] matches everything, including newline (useful for multi-line matching)
Edge Case Examples
re.findall(r'[-a-z]', 'abc-') # ['a', 'b', 'c', '-']
re.findall(r'[a-z-]', 'abc-') # ['a', 'b', 'c', '-']
re.findall(r'[\[\]\\]', '[]\\') # ['[', ']', '\\']
re.findall(r'[\s\S]', 'A\nB') # ['A', '\n', 'B']Regex: Quantifiers & Repetition
Overview
In a regex pattern, quantifiers specify the number of times a character, group, or character class should appear.
Standard Quantifiers
- * → 0 or more occurrences (e.g., a* matches '', 'a', 'aaa')
- + → 1 or more occurrences (e.g., a+ matches 'a', 'aaa', not empty)
- ? → 0 or 1 occurrence (optional, e.g., colou?r matches 'color' or 'colour')
- {n} → exactly n times (e.g., \d{4} matches 4 digits)
- {n,} → at least n times (e.g., \d{2,} matches 2 or more digits)
- {n,m} → between n and m times (e.g., \d{2,4} matches 2 to 4 digits)
Standard Quantifier Examples
re.findall(r'a*', 'aaab') # ['', 'aaa', '']
re.findall(r'a+', 'aaab') # ['aaa']
re.findall(r'colou?r', 'color colour') # ['color', 'colour']
re.findall(r'\d{4}', 'Year: 2025') # ['2025']
re.findall(r'\d{2,}', '42 123') # ['42', '123']
re.findall(r'\d{2,4}', '1 12 123 12345') # ['12', '123', '1234']Greedy vs Lazy (Non-Greedy)
- Greedy quantifiers (default) try to match as much as possible (.*)
- Lazy quantifiers (minimal match) use ? (.*?), matches as little as possible
- Common trap: greedy .* can match too much (e.g., <.*> matches entire content, use <.*?> for individual tags)
Greedy vs Lazy Examples
text = 'onetwo'
re.findall(r'<.*?>', text) # ['', '', '', '']
re.findall(r'<.*>', text) # ['onetwo']Greedy vs Lazy Summary
| Type | Example | Behavior |
|---|---|---|
| Greedy | .* | Matches maximum content |
| Lazy | .*? | Matches minimum content |
Possessive Quantifiers (Conceptual)
- Possessive quantifiers match as much as possible and do NOT backtrack
- Python does NOT support possessive quantifiers natively
- Atomic groups (?>...) simulate possessive behavior: once matched, characters are not given back
- Improves performance in complex patterns
Atomic Group Example
re.findall(r'(?>\d+)', '123 456') # simulates possessive behavior in PythonInterview Tip
Regex: Anchors & Boundaries
Overview
Instead of using real characters, anchors and bounds help in matching textual positions. Word boundaries are denoted by \b and \B, whereas anchors are denoted by ^ and $.
Anchors
- ^ → matches the start of a string (or line with re.MULTILINE)
- $ → matches the end of a string (or line with re.MULTILINE)
Anchor Examples
re.findall(r'^Hello', 'Hello World') # ['Hello']
re.findall(r'World$', 'Hello World') # ['World']
# Multiline behavior
text = 'Hello\nWorld'
re.findall(r'^World', text, re.MULTILINE) # ['World']Word Boundaries
- \b → matches a word boundary (between \w and \W)
- \B → matches a position that is not a word boundary
- Python regex handles Unicode characters as part of \w, so accented letters are included
- Common pitfall: digits and underscore are considered \w, so \b behaves accordingly
Word Boundary Examples
re.findall(r'\bcat\b', 'concatenate cat') # ['cat']
re.findall(r'\bvar1\b', 'var1_var2') # ['var1']Regex: Grouping & Capturing
Overview
Groups hekp us to extract, organize, or reuse parts of a regex match. Python helps in capturing groups, non-capturing groups, named groups, and backreferences.
Capturing Groups
- Parentheses () capture matched content
- Groups are numbered by order of opening parentheses
- Nested groups get their own numbers
Capturing Groups Example
match = re.match(r'(\w+) (\w+)', 'Hello World')
match.groups() # ('Hello', 'World')Non-Capturing Groups
- (?:pattern) groups for structuring regex without capturing
- Improves performance and clarity when capturing is unnecessary
Non-Capturing Group Example
re.findall(r'(?:cat|dog)s?', 'cats dog') # ['cats', 'dog']Named Capturing Groups
- (?P
pattern) assigns a name to a group for readability - Access matched groups via groupdict()
- Easier to read and maintain, avoids confusion with group numbers in complex patterns
Named Capturing Group Example
match = re.match(r'(?P\w+) (?P\w+)', 'John Doe')
match.groupdict() # {'first': 'John', 'last': 'Doe'}Backreferences & Reuse
- Backreferences refer to a previously matched group
- Enable pattern reuse in regex for repeated structures
Regex: Backreferences & Lookarounds
Overview
Backreferences allow reuse of previously captured groups, and lookarounds assert context without consuming characters. These are commonly asked in interviews for pattern validation and text extraction.
Backreferences
- Numeric backreferences: \1, \2, etc., refer to captured groups by number
- Named backreferences: (?P
...) to capture, (?P=name) to refer by name
Backreferences Examples
re.findall(r'(\w+)\s\1', 'hello hello world') # ['hello']
re.findall(r'(?P\w+)\s(?P=word)', 'foo foo bar') # ['foo']Common Use Cases
- Detect repeated words: (\w+)\s\1
- Match symmetrical patterns: (\d)(\d)\2\1 (e.g., 1221)
- Validate paired characters: (["']).*?\1 (quotes, brackets)
Lookarounds
- Lookarounds assert context without consuming characters
- Positive lookahead: (?=pattern) — pattern must follow
- Negative lookahead: (?!pattern) — pattern must not follow
- Positive lookbehind: (?<=pattern) — pattern must precede
- Negative lookbehind: (?
- Python limitation: Lookbehind patterns must be fixed-width
Lookahead Examples
re.findall(r'\w+(?=\d)', 'item1 item2') # ['item', 'item']
re.findall(r'\w+(?!\d)', 'item1 itemX') # ['itemX']Lookbehind Examples
re.findall(r'(?<=\$)\d+', 'Price: $100') # ['100']Practical Applications
- Password validation: (?=.*[A-Z])(?=.*\d).{8,} (At least 8 characters, one uppercase, one digit)
- Conditional matching: use lookarounds to match only if another pattern exists
- Excluding substrings without consuming text: useful in context-sensitive text processing
Regex: Flags & Modifiers
Overview
Regex flags change the behavior of patterns in Python. They can be applied globally via re.compile or functions, or inline within the pattern.
Common Regex Flags
| Flag | Alias | Description | Example |
|---|---|---|---|
| re.IGNORECASE | re.I | Case-insensitive matching | re.search(r'cat', 'CAT', re.I) |
| re.MULTILINE | re.M | ^ and $ match start/end of each line | re.findall(r'^Hi', 'Hi\nHello', re.M) |
| re.DOTALL | re.S | . matches newline as well | re.search(r'a.*b', 'a\nb', re.S) |
| re.VERBOSE | re.X | Allows whitespace & comments inside pattern | pattern = re.compile(r'''\d+ # digits''', re.X) |
| re.ASCII | re.A | \w, \b, \d match ASCII only (not Unicode) | re.findall(r'\w+', 'café', re.A) # ['caf'] |
Interview Tip
Inline Flags
| Syntax | Meaning | Example |
|---|---|---|
| (?i) | Case-insensitive for entire pattern | re.search(r'(?i)cat', 'CAT') |
| (?m) | Multiline mode for entire pattern | re.findall(r'(?m)^Hi', 'Hi\nHello') |
| (?i:pattern) | Scoped flag for specific part only | re.search(r'(?i:cat)Dog', 'CATDog') |
Interview Insight
Regex: Flags & Modifiers
Overview
Regex flags modify the behavior of patterns in Python. They can be applied globally via re.compile or functions, or inline within the pattern.
Common Regex Flags
| Flag | Alias | Description | Example |
|---|---|---|---|
| re.IGNORECASE | re.I | Case-insensitive matching | re.search(r'cat', 'CAT', re.I) |
| re.MULTILINE | re.M | ^ and $ match start/end of each line | re.findall(r'^Hi', 'Hi\nHello', re.M) |
| re.DOTALL | re.S | . matches newline as well | re.search(r'a.*b', 'a\nb', re.S) |
| re.VERBOSE | re.X | Allows whitespace & comments inside pattern | pattern = re.compile(r'''\d+ # digits''', re.X) |
| re.ASCII | re.A | \w, \b, \d match ASCII only (not Unicode) | re.findall(r'\w+', 'café', re.A) # ['caf'] |
Interview Tip
Inline Flags
| Syntax | Meaning | Example |
|---|---|---|
| (?i) | Case-insensitive for entire pattern | re.search(r'(?i)cat', 'CAT') |
| (?m) | Multiline mode for entire pattern | re.findall(r'(?m)^Hi', 'Hi\nHello') |
| (?i:pattern) | Scoped flag for specific part only | re.search(r'(?i:cat)Dog', 'CATDog') |
Interview Insight
Python Regex: re Module Functions
Overview
For using regex in Python, the `re` module offers basic methods and constructed pattern support. Interviews require an understanding of compilation patterns and function differences.
Core re Functions
| Function | Behavior | Example |
|---|---|---|
| re.match(pattern, string, flags=0) | Match only at start | re.match(r'\d+', '123abc') |
| re.search(pattern, string, flags=0) | Match anywhere in string | re.search(r'\d+', 'abc123def') |
| re.findall(pattern, string, flags=0) | Returns all matches as list | re.findall(r'\d+', '12, 34') → ['12','34'] |
| re.finditer(pattern, string, flags=0) | Returns iterator of Match objects | for m in re.finditer(r'\d+', '12, 34'): print(m.group()) |
| re.sub(pattern, repl, string, count=0, flags=0) | Replace matches with repl | re.sub(r'\d+', '#', 'a1b2') → 'a#b#' |
| re.subn(pattern, repl, string, count=0, flags=0) | Same as sub + returns tuple (new_string, num_subs) | re.subn(r'\d+', '#', 'a1b2') → ('a#b#', 2) |
| re.split(pattern, string, maxsplit=0, flags=0) | Split string on regex matches | re.split(r'\s+', 'a b c') → ['a','b','c'] |
Interview Tip
Compiled Patterns
- Use re.compile(pattern, flags=0) to precompile regex for repeated use
- Benefits: Performance (parsed once), Reusability (store pattern), Readability (centralized complex patterns)
Compiled Pattern Examples
pattern = re.compile(r'\d+', re.I)
pattern.search('123abc') # correct
pattern.findall('12, 34') # correct
# Reusable compiled pattern
digits = re.compile(r'\d+')
numbers = [digits.findall(s) for s in ['abc12', '34def']]
# [['12'], ['34']]Interview Tip
Summary of Flags and Functions
| Concept | Key Points |
|---|---|
| Flags | re.I = ignore case, re.M = multiline, re.S = dotall, re.X = verbose, re.A = ASCII |
| Inline Flags | (?i) = global, (?i:...) = scoped |
| Core Functions | match vs search, findall vs finditer, sub vs subn, split |
| Compiled Patterns | Use re.compile for repeated patterns for performance & readability |
Regex: Object Methods
Overview
When we apply re.compile(), it returns a Regex object with methods that mirror the re module functions. Implementing these object methods improves readability and performance.
Regex Object Methods
| Method | Description | Example |
|---|---|---|
| .search(string, pos=0, endpos=None) | Scan string for first match anywhere | pattern = re.compile(r'\d+') pattern.search('abc123def') |
| .match(string, pos=0, endpos=None) | Match only at start | pattern.match('123abc') |
| .fullmatch(string, pos=0, endpos=None) | Entire string must match | pattern.fullmatch('123') # correct pattern.fullmatch('123abc') # wrong |
| .findall(string, pos=0, endpos=None) | Return all matches as list of strings | pattern.findall('12, 34') → ['12','34'] |
| .finditer(string, pos=0, endpos=None) | Return iterator of match objects | for m in pattern.finditer('12, 34'): print(m.group()) |
| .sub(repl, string, count=0) | Replace matches with repl | pattern.sub('#', 'a1b2') → 'a#b#' |
Interview Tip
Regex: Match Object Deep Dive
Overview
A Match object is returned by .search(), .match(), .fullmatch(), and .finditer(). Understanding Match objects is essential for accessing matched text, positions, and handling truthiness in Python.
Accessing Matched Text
| Method | Description | Example |
|---|---|---|
| .group() | Entire matched string | m = re.search(r'\d+', 'abc123') m.group() # '123' |
| .group(n) | Specific captured group | m = re.search(r'(\d+)-(\w+)', '123-abc') m.group(2) # 'abc' |
| .groups() | Tuple of all captured groups | m.groups() # ('123','abc') |
| .groupdict() | Dictionary of named groups | m = re.search(r'(?P |
Accessing Match Position
| Method | Description | Example |
|---|---|---|
| .start() | Start index of match (or group) | m.start() # 3 |
| .end() | End index of match (or group) | m.end() # 6 |
| .span() | Tuple (start, end) of match (or group) | m.span() # (3,6) |
Truthiness of Match Objects
None) if not;example:
m = re.search(r'\d+', 'abc123') → use if m: (not if m != None) to print the match — Pythonic and clean.Quick Notes & Pitfalls
- .group(0) is the same as .group()
- .groups() returns all captured groups; unmatched optional groups → None
- .groupdict() requires named groups (?P
...) - .span() is often asked in string slicing problems
Regex: Common Interview Scenarios
Overview
Email validation, password strength checking, log parsing, date/time extraction, and URL matching are examples of practical regex scenarios that are frequently asked during interviews. Prioritize edge cases and realistic formats over complete semantic validation.
Email Validation
pattern = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')
pattern.fullmatch('test.email+alias@example.com') # correct
# Full RFC 5322 validation with regex is too complex
# Avoid using overly complex regex for interviews
Password Strength Validation
pattern = re.compile(r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$')
# Simple patterns without lookaheads may miss some requirements
Log File Parsing
pattern = re.compile(r'\[(.*?)\] (\w+) (.*)')
m = pattern.match('[2025-12-28 14:22:10] ERROR Something failed')
m.groups() # ('2025-12-28 14:22:10', 'ERROR', 'Something failed')
# Using greedy .* may overmatch
pattern = re.compile(r'\[(.*)\] (\w+) (.*)')
Date/Time Extraction
# YYYY-MM-DD
re.findall(r'\b\d{4}-\d{2}-\d{2}\b', text)
# DD/MM/YYYY
re.findall(r'\b\d{2}/\d{2}/\d{4}\b', text)
# Regex alone cannot validate month/day ranges; do not rely on it for full validation
URL Matching Pitfalls
pattern = re.compile(r'https?://[^\s/$.?#][^\s]*')
r'https?://\S+' # Overmatches trailing punctuation and invalid URLs
Parsing vs Validating Input
| Task | Regex Suitability |
|---|---|
| Parsing log lines | Excellent |
| Extracting dates from text | Good |
| Validating emails/URLs strictly | Limited |
| Password policy enforcement | With lookaheads |
| Semantic validation (e.g., leap year dates) | Not recommended |
Quick Interview Takeaways
- Use non-greedy quantifiers when parsing structured text.
- Use lookaheads/lookbehinds for complex validations.
- Understand regex limitations—interviewers often ask this.
- Always test edge cases, e.g., optional fields, unusual input.
- Explain why your pattern works—not just the code.
Regex: Testing & Debugging
Overview
For accuracy, readability, and proving comprehension in interviews, it is essential to test regex patterns before using them.
Using re.DEBUG
pattern = re.compile(r'\d{2}-\d{2}-\d{4}', re.DEBUG)
# Shows a breakdown of regex tokens
# Not using re.DEBUG prevents understanding how the pattern is parsed internally
Incremental Pattern Building
Example:
# Step 1:
Date : r'\d{4}-\d{2}-\d{2}'
# Step 2:
Date + Time : r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
Online Regex Testers
Unit Testing Regex Patterns
import re, unittest
pattern = re.compile(r'\d{2}-\d{2}-\d{4}')
class TestRegex(unittest.TestCase):
def test_valid_date(self):
self.assertTrue(pattern.fullmatch('28-12-2025'))
def test_invalid_date(self):
self.assertIsNone(pattern.fullmatch('2025-12-28'))
if __name__ == '__main__':
unittest.main()
# Skipping unit tests may lead to undetected edge-case failures
Regex: Pitfalls & Design Thinking
Overview
For interviews, it's critical to recognize typical mistakes and adhere to a disciplined design strategy. Not simply syntax, but also problem-solving and clarity are reflected in regex design.
Common Pitfalls Interviewers Look For
| Pitfall | Description / Example |
|---|---|
| match() vs search() | match() only at start, search() anywhere |
| Forgetting raw strings | "\\d+" vs r"\d+" → leads to errors |
| Overusing .* | Greedy match eats more than intended → use .*? or anchors |
| Misunderstanding word boundaries | \b is a position, not a character |
| Incorrect lookbehind width | Python requires fixed-width lookbehind ((?<=...)) |
Interview Tip
Regex Design Thinking
Problem Decomposition
- Break problem into smaller matching units
- Example: Log line → [timestamp][level][message] → build regex component by component
Step-by-Step Pattern Building
- Identify literals first
- Identify variable parts → use character classes / quantifiers
- Add anchors (^, $) or word boundaries
- Wrap in capturing or named groups
- Test each step incrementally
Explaining Intent
Balancing Correctness, Performance, Readability
Performance: Avoid unnecessary backtracking (e.g., .* inside repeated groups).
Readability: Use re.VERBOSE, named groups, and comments.
Interview Takeaways
- Always build regex iteratively
- Explain thought process, not just code
- Avoid shortcuts that break readability or correctness
- Know limitations of regex; use parsing libraries when appropriate
- Include unit tests or example cases when possible
