Class Transitions Problem¶
You have a DataFrame called schedules
that represents the daily schedule of each student in a school. For example, If
Ryan attends four classes - math, english, history, and chemistry, your schedules
DataFrame will have four
rows for Ryan in the order he attends each class.
import numpy as np
import pandas as pd
generator = np.random.default_rng(seed=1234)
classes = ['english', 'math', 'history', 'chemistry', 'gym', 'civics', 'writing', 'engineering']
schedules = pd.DataFrame({
'student_id':np.repeat(np.arange(100), 4),
'class':generator.choice(classes, size=400, replace=True)
}).drop_duplicates()
schedules['grade'] = generator.integers(101, size=schedules.shape[0])
print(schedules)
# student_id class grade
# 0 0 engineering 86
# 3 0 chemistry 75
# 4 1 math 85
# 5 1 engineering 0
# 6 1 english 73
# .. ... ... ...
# 394 98 writing 16
# 395 98 civics 89
# 396 99 engineering 90
# 398 99 math 55
# 399 99 history 31
#
# [339 rows x 3 columns]
You have this theory that the sequence of class-to-class transitions affects students' grades. For instance, you suspect Ryan would do better in his Chemistry class if it immediately followed his Math class instead of his History class.
Determine the average and median Chemistry grade for groups of students based on the class they have immediately prior to Chemistry. Also report how many students fall into each group.