OB-GYM Problem¶
You own a gym for pregnant women called “OB-GYM” and you recently opened a second location. You’d like to analyze its performance, but your reporting software has given you the sales data in an awkward format.
import numpy as np
import pandas as pd
generator = np.random.default_rng(314)
sales = pd.DataFrame({
'date':pd.date_range(start = '2020-01-01', periods=5).repeat(2),
'store_id':np.tile([1,2], 5),
'sales1':np.round(generator.normal(loc=750, scale=20, size=10), 2),
'sales2':np.round(generator.normal(loc=650, scale=40, size=10), 2),
'members':generator.integers(low=20, high=25, size=10)
})
sales.loc[sales.store_id == 2, 'sales1'] = np.nan
sales.loc[sales.store_id == 1, 'sales2'] = np.nan
print(sales)
# date store_id sales1 sales2 members
# 0 2020-01-01 1 737.54 NaN 22
# 1 2020-01-01 2 NaN 629.00 20
# 2 2020-01-02 1 750.75 NaN 23
# 3 2020-01-02 2 NaN 699.01 22
# 4 2020-01-03 1 750.60 NaN 20
# 5 2020-01-03 2 NaN 640.20 24
# 6 2020-01-04 1 752.65 NaN 21
# 7 2020-01-04 2 NaN 695.64 22
# 8 2020-01-05 1 747.02 NaN 20
# 9 2020-01-05 2 NaN 632.40 22
Reshape it into a DataFrame like this
# sales_1 sales_2 members_1 members_2
# date
# 2020-01-01 737.54 629.00 22 20
# 2020-01-02 750.75 699.01 23 22
# 2020-01-03 750.60 640.20 20 24
# 2020-01-04 752.65 695.64 21 22
# 2020-01-05 747.02 632.40 20 22