Advanced | Regular Expressions In Python
Lookbehinds and Lookaheads¶
A lookbehind lets you qualify an expression A such that, some other expression B does or does not come immediately before A.
Similarly, a lookahead lets you qualify an expression A such that, some other expression B does or does not come immediately after A.
Pattern | Name | Description | Example |
---|---|---|---|
(?<=2)dog |
positive lookbehind | Match dog with 2 before it | 1dog2dog3dog |
(?<!2)dog |
negative lookbehind | Match dog without 2 before it | 1dog2dog3dog |
dog(?=2) |
positive lookahead | Match dog with 2 after it | 1dog2dog3dog |
dog(?!2) |
negative lookahead | Match dog without 2 after it | 1dog2dog3dog |
Lazy Search Operator¶
By default, regular expressions are greedy meaning they attempt to find the longest matching substring.
To make a pattern non greedy, give it the non greedy qualifier, ?
. For example, given
"dogcatmouserat"
the pattern dog.*a
matches:
- dog
dog
- followed by any character with zero or more repetitions
.*
- followed by a
a
There are multiple substrings that meet these criteria!
dogcatmouserat <- greedy
dogcatmouserat <- non greedy
By default, Python's regex engine returns the greedy result.
re.search(pattern="dog.*a", string="dogcatmouserat")
# <re.Match object; span=(0, 13), match='dogcatmousera'>
If you want the non greedy result, use *?
.
re.search(pattern="dog.*?a", string="dogcatmouserat")
# <re.Match object; span=(0, 5), match='dogca'>
The non greedy qualifier can make these expressions non greedy:
Greedy | Non Greedy |
---|---|
* |
*? |
+ |
+? |
? |
?? |
{m,n} |
{m,n}? |