Groundhog Day Problem¶
Here's a quote from Groundhog Day.
quote = """Once a year, the eyes of the nation turn here, to this tiny
hamlet in Pennsylvania, to watch a master at work. The master?
Punxsutawney Phil, the world's most famous weatherman, the
groundhog, who, as legend has it, can predict the coming of an
early spring. And here's the big moment we've all been waiting for. Let's just
see what Mr. Groundhog has to say. Hey! Over here, you little
weasel! Well, that's it. Sorry you couldn't be here in person to
share the electric moment. This is one event where television
really fails to capture the excitement of thousands of people
gathered to watch a large squirrel predict the weather, and
I for one am deeply grateful to have been a part of it.
Reporting for Channel 9, this is Phil Connors.
"""
Find all substrings that, ignoring case sensitivity,
- begin with one of these words:
['the', 'this', 'to', 'in']
- end with one of these words:
['phil', 'weatherman', 'groundhog', 'pennsylvania', 'master']
- have 30 or fewer characters in between the begin and end word (including spaces and newline characters).
Keep the earliest identified, non-overlapping, non-nested substrings when scanning from left to right.
starters = ['the', 'this', 'to', 'in']
enders = ['phil', 'weatherman', 'groundhog', 'pennsylvania', 'master']
Expected result
expected = [
'to this tiny\nhamlet in Pennsylvania',
'to watch a master',
'The master',
"the world's most famous weatherman",
'the\ngroundhog',
'this is Phil'
]
Once a year, the eyes of the nation turn here, to this tiny
hamlet in Pennsylvania, to watch a master at work. The master?
Punxsutawney Phil, the world's most famous weatherman, the
groundhog, who, as legend has it, can predict the coming of an
early spring. And here's the big moment we've all been waiting for. Let's just
see what Mr. Groundhog has to say. Hey! Over here, you little
weasel! Well, that's it. Sorry you couldn't be here in person to
share the electric moment. This is one event where television
really fails to capture the excitement of thousands of people
gathered to watch a large squirrel predict the weather, and
I for one am deeply grateful to have been a part of it.
Reporting for Channel 9, this is Phil Connors.
Note that the result includes
to watch a master
and
The master
but not
to watch a master at work. The master
Regex Functions
Function | Description | Return Value |
---|---|---|
re.findall(pattern, string, flags=0) |
Find all non-overlapping occurrences of pattern in string | list of strings, or list of tuples if > 1 capture group |
re.finditer(pattern, string, flags=0) |
Find all non-overlapping occurrences of pattern in string | iterator yielding match objects |
re.search(pattern, string, flags=0) |
Find first occurrence of pattern in string | match object or None |
re.split(pattern, string, maxsplit=0, flags=0) |
Split string by occurrences of pattern | list of strings |
re.sub(pattern, repl, string, count=0, flags=0) |
Replace pattern with repl | new string with the replacement(s) |
Regex Patterns
Pattern | Description |
---|---|
[abc] |
a or b or c |
[^abc] |
not (a or b or c) |
[a-z] |
a or b ... or y or z |
[1-9] |
1 or 2 ... or 8 or 9 |
\d |
digits [0-9] |
\D |
non-digits [^0-9] |
\s |
whitespace [ \t\n\r\f\v] |
\S |
non-whitespace [^ \t\n\r\f\v] |
\w |
alphanumeric [a-zA-Z0-9_] |
\W |
non-alphanumeric [^a-zA-Z0-9_] |
. |
any character |
x* |
zero or more repetitions of x |
x+ |
one or more repetitions of x |
x? |
zero or one repetitions of x |
{m} |
m repetitions |
{m,n} |
m to n repetitions |
{m,n} |
m to n repetitions |
\\ , \. , \* |
backslash, period, asterisk |
\b |
word boundary |
^hello |
starts with hello |
bye$ |
ends with bye |
(...) |
capture group |
(po|go) |
po or go |