Beginner | Regular Expressions In Python

What's a regular expression?¶

A regular expression (aka regex) is a special syntax that lets you match strings based on conditions. For example, the regular expression \d+\s[a-z]+ matches strings that have

one or more digits (\d+)
followed by a single space (\s)
followed by one or more lowercase letters between a and z ([a-z]+)

20 quick brown foxes jumped over 2 lazy dogs, 8 sleepy cats, and 4 loud crickets.

Table of regular expression patterns¶

Pattern	Description
`[abc]`	a or b or c
`[^abc]`	not (a or b or c)
`[a-z]`	a or b ... or y or z
`[1-9]`	1 or 2 ... or 8 or 9
`\d`	digits `[0-9]`
`\D`	non-digits `[^0-9]`
`\s`	whitespace `[ \t\n\r\f\v]`
`\S`	non-whitespace `[^ \t\n\r\f\v]`
`\w`	alphanumeric `[a-zA-Z0-9_]`
`\W`	non-alphanumeric `[^a-zA-Z0-9_]`
`.`	any character
`x*`	zero or more repetitions of x
`x+`	one or more repetitions of x
`x?`	zero or one repetitions of x
`{m}`	m repetitions
`{m,n}`	m to n repetitions
`{m,n}`	m to n repetitions
`\\`, `\.`, `\*`	backslash, period, asterisk
`\b`	word boundary
`^hello`	starts with hello
`bye$`	ends with bye
`(...)`	capture group
`(po\|go)`	po or go

How do regular expressions work in Python?¶

In Python, regular expressions are managed by the re module.

Table of regular expression functions in Python¶

Function	Description	Return Value
`re.findall(pattern, string, flags=0)`	Find all non-overlapping occurrences of pattern in string	list of strings, or list of tuples if > 1 capture group
`re.finditer(pattern, string, flags=0)`	Find all non-overlapping occurrences of pattern in string	iterator yielding match objects
`re.search(pattern, string, flags=0)`	Find first occurrence of pattern in string	match object or `None`
`re.split(pattern, string, maxsplit=0, flags=0)`	Split string by occurrences of pattern	list of strings
`re.sub(pattern, repl, string, count=0, flags=0)`	Replace pattern with repl	new string with the replacement(s)

What about `re.compile()`?¶

The following regular expression searches have equivalent logic...

import re
pat = re.compile("[A-Z][a-z]+") # (1)!
pat.findall("Hi, I'm Bob.")
# ['Hi', 'Bob']

One uppercase letter followed by one or more lowercase letters

import re
re.findall(pattern="[A-Z][a-z]+", string="Hi, I'm Bob.")
# ['Hi', 'Bob']

but the first version compiles the regular expression into a re.Pattern object.

type(pat)  # <class 're.Pattern'>

This can boost performance in cases where you use the same regular expression repeatedly.

Beginner | Regular Expressions In Python

What's a regular expression?¶

Table of regular expression patterns¶

How do regular expressions work in Python?¶

Table of regular expression functions in Python¶

What about re.compile()?¶

What about `re.compile()`?¶