Dog Costumes Problem¶
You own an ecommerce store that sells costumes for dogs . You'd like to analyze your web traffic by measuring the count of unique visitors amongst your web sessions data. Additionally, you want the ability to filter by date, so you can answer questions like
How many visitors visited my site?
and
How many visitors visited my site on February 20th, 2022?
You come up with the following count_visitors()
function that you place inside measures.py
.
def count_visitors(sessions, date=None):
"""
Count the number of unique visitors
:param sessions: list of sessions were each session has a visitor_id and date
:param date: optional filter by date
:return: number of unique visitors
"""
if date is None:
visitors = {x.visitor_id for x in sessions}
else:
visitors = {x.visitor_id for x in sessions if x.date == date}
return len(visitors)
To test it, you create test_measures.py
as follows.
from datetime import date
from types import SimpleNamespace
import time
from measures import count_visitors
def load_sessions():
"""Super expensive function that loads the sessions data"""
print("loading sessions data.. hold on a sec")
time.sleep(5)
sessions = [
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-02')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-02')),
SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-04')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-10')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-11')),
SimpleNamespace(visitor_id=128, date=date.fromisoformat('2022-02-15')),
SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-17')),
SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-17'))
]
return sessions
def test_count_visitors():
"""Confirm that count_visitors works as expected"""
sessions = load_sessions()
assert count_visitors(sessions) == 4
def test_count_visitors_on_date():
"""Confirm that count_visitors works as expected when a date filter is used"""
sessions = load_sessions()
assert count_visitors(sessions, date=date.fromisoformat('2022-02-17')) == 2
You have a problem.. Both of your test functions call the load_sessions()
function in order to load the sessions data,
but this data is really expensive (slow) to fetch. (In theory, load_sessions()
might connect to a database and run an
expensive query.) See if you can cut your tests runtime in half .
Run with messages
Notice the print()
statement inside the load_sessions()
function. To view the output of this print statement in the
console, run the tests with pytest -s
as opposed to simply pytest
. The-s
flag tells pytest not to capture the
result of standard out as it normally does.
pytest
output:
pytest -s
output:
Directory Structure¶
dog_costumes/
measures.py
test_measures.py