Skip to content

OB-GYM Problem


You own a gym 💪🏾 for pregnant women 🤰🏾 called “OB-GYM” and you recently opened a second location. You’d like to analyze its performance, but your reporting software has given you the sales data in an awkward format.

import numpy as np
import pandas as pd

generator = np.random.default_rng(314)

sales = pd.DataFrame({
    'date':pd.date_range(start = '2020-01-01', periods=5).repeat(2),
    'store_id':np.tile([1,2], 5),
    'sales1':np.round(generator.normal(loc=750, scale=20, size=10), 2),
    'sales2':np.round(generator.normal(loc=650, scale=40, size=10), 2),
    'members':generator.integers(low=20, high=25, size=10)
})
sales.loc[sales.store_id == 2, 'sales1'] = np.nan
sales.loc[sales.store_id == 1, 'sales2'] = np.nan

print(sales)
#         date  store_id  sales1  sales2  members
# 0 2020-01-01         1  737.54     NaN       22
# 1 2020-01-01         2     NaN  629.00       20
# 2 2020-01-02         1  750.75     NaN       23
# 3 2020-01-02         2     NaN  699.01       22
# 4 2020-01-03         1  750.60     NaN       20
# 5 2020-01-03         2     NaN  640.20       24
# 6 2020-01-04         1  752.65     NaN       21
# 7 2020-01-04         2     NaN  695.64       22
# 8 2020-01-05         1  747.02     NaN       20
# 9 2020-01-05         2     NaN  632.40       22

Reshape it into a DataFrame like this

#               sales_1  sales_2  members_1  members_2
# date
# 2020-01-01    737.54    629.00         22         20
# 2020-01-02    750.75    699.01         23         22
# 2020-01-03    750.60    640.20         20         24
# 2020-01-04    752.65    695.64         21         22
# 2020-01-05    747.02    632.40         20         22

Try with Google Colab