Beginner | Regular Expressions In Python
What's a regular expression?¶
A regular expression (aka regex) is a special syntax that lets you match strings based on conditions. For
example, the regular expression \d+\s[a-z]+
matches strings that have
- one or more digits (
\d+
) - followed by a single space (
\s
) - followed by one or more lowercase letters between a and z (
[a-z]+
)
20 quick brown foxes jumped over 2 lazy dogs, 8 sleepy cats, and 4 loud crickets.
Table of regular expression patterns¶
Pattern | Description |
---|---|
[abc] |
a or b or c |
[^abc] |
not (a or b or c) |
[a-z] |
a or b ... or y or z |
[1-9] |
1 or 2 ... or 8 or 9 |
\d |
digits [0-9] |
\D |
non-digits [^0-9] |
\s |
whitespace [ \t\n\r\f\v] |
\S |
non-whitespace [^ \t\n\r\f\v] |
\w |
alphanumeric [a-zA-Z0-9_] |
\W |
non-alphanumeric [^a-zA-Z0-9_] |
. |
any character |
x* |
zero or more repetitions of x |
x+ |
one or more repetitions of x |
x? |
zero or one repetitions of x |
{m} |
m repetitions |
{m,n} |
m to n repetitions |
{m,n} |
m to n repetitions |
\\ , \. , \* |
backslash, period, asterisk |
\b |
word boundary |
^hello |
starts with hello |
bye$ |
ends with bye |
(...) |
capture group |
(po|go) |
po or go |
How do regular expressions work in Python?¶
In Python, regular expressions are managed by the re
module.
Table of regular expression functions in Python¶
Function | Description | Return Value |
---|---|---|
re.findall(pattern, string, flags=0) |
Find all non-overlapping occurrences of pattern in string | list of strings, or list of tuples if > 1 capture group |
re.finditer(pattern, string, flags=0) |
Find all non-overlapping occurrences of pattern in string | iterator yielding match objects |
re.search(pattern, string, flags=0) |
Find first occurrence of pattern in string | match object or None |
re.split(pattern, string, maxsplit=0, flags=0) |
Split string by occurrences of pattern | list of strings |
re.sub(pattern, repl, string, count=0, flags=0) |
Replace pattern with repl | new string with the replacement(s) |
What about re.compile()
?¶
The following regular expression searches have equivalent logic...
import re
pat = re.compile("[A-Z][a-z]+") # (1)!
pat.findall("Hi, I'm Bob.")
# ['Hi', 'Bob']
- One uppercase letter followed by one or more lowercase letters
import re
re.findall(pattern="[A-Z][a-z]+", string="Hi, I'm Bob.")
# ['Hi', 'Bob']
but the first version compiles the regular expression into a re.Pattern
object.
type(pat) # <class 're.Pattern'>
This can boost performance in cases where you use the same regular expression repeatedly.