Class Transitions Problem¶

You have a DataFrame called schedules that represents the daily schedule of each student in a school. For example, If Ryan attends four classes - math, english, history, and chemistry, your schedules DataFrame will have four rows for Ryan in the order he attends each class.

import numpy as np
import pandas as pd

generator = np.random.default_rng(seed=1234)
classes = ['english', 'math', 'history', 'chemistry', 'gym', 'civics', 'writing', 'engineering']

schedules = pd.DataFrame({
    'student_id':np.repeat(np.arange(100), 4),
    'class':generator.choice(classes, size=400, replace=True)
}).drop_duplicates()
schedules['grade'] = generator.integers(101, size=schedules.shape[0])

print(schedules)
#      student_id        class  grade
# 0             0  engineering     86
# 3             0    chemistry     75
# 4             1         math     85
# 5             1  engineering      0
# 6             1      english     73
# ..          ...          ...    ...
# 394          98      writing     16
# 395          98       civics     89
# 396          99  engineering     90
# 398          99         math     55
# 399          99      history     31
# 
# [339 rows x 3 columns]

You have this theory that the sequence of class-to-class transitions affects students' grades. For instance, you suspect Ryan would do better in his Chemistry class if it immediately followed his Math class instead of his History class.

Determine the average and median Chemistry grade for groups of students based on the class they have immediately prior to Chemistry. Also report how many students fall into each group.

Try with Google Colab