The Matrix Problem¶
Here's a quote from The Matrix
quote = """The title bar reads: "Combat Series 10 of 12," file\ncategories flashing beneath it: Savate, Jujitsu, Ken Po,\n Drunken Boxing..."""
Let Tank, Neo's instructor, get a list of martial arts Neo has learnt.
Expected result
expected = ['Savate', 'Jujitsu', 'Ken Po', 'Drunken Boxing']
Note these are strings between commas after the word "categories" and the next colon. Try to accomplish the task with a single regex pattern that would match all occurrences in the string.
Mind that you will need the PyPi regex
module to achieve this. Make sure you install it by running pip install regex
(or pip3 install regex
if you are working in Linux) in your console/terminal.
Regex Functions
Function | Description | Return Value |
---|---|---|
re.findall(pattern, string, flags=0) |
Find all non-overlapping occurrences of pattern in string | list of strings, or list of tuples if > 1 capture group |
re.finditer(pattern, string, flags=0) |
Find all non-overlapping occurrences of pattern in string | iterator yielding match objects |
re.search(pattern, string, flags=0) |
Find first occurrence of pattern in string | match object or None |
re.split(pattern, string, maxsplit=0, flags=0) |
Split string by occurrences of pattern | list of strings |
re.sub(pattern, repl, string, count=0, flags=0) |
Replace pattern with repl | new string with the replacement(s) |
Regex Patterns
Pattern | Description |
---|---|
[abc] |
a or b or c |
[^abc] |
not (a or b or c) |
[a-z] |
a or b ... or y or z |
[1-9] |
1 or 2 ... or 8 or 9 |
\d |
digits [0-9] |
\D |
non-digits [^0-9] |
\s |
whitespace [ \t\n\r\f\v] |
\S |
non-whitespace [^ \t\n\r\f\v] |
\w |
alphanumeric [a-zA-Z0-9_] |
\W |
non-alphanumeric [^a-zA-Z0-9_] |
. |
any character |
x* |
zero or more repetitions of x |
x+ |
one or more repetitions of x |
x? |
zero or one repetitions of x |
{m} |
m repetitions |
{m,n} |
m to n repetitions |
{m,n} |
m to n repetitions |
\\ , \. , \* |
backslash, period, asterisk |
\b |
word boundary |
^hello |
starts with hello |
bye$ |
ends with bye |
(...) |
capture group |
(po|go) |
po or go |