我有以下数据集;
Subject Student ID Student Number
0 Cit11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
1 EngLang11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
2 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
3 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
4 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
5 His11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
6 Mat11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
7 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
其中'Student Number'
是每个'Student ID'
中'Subject'
的总数。
让我们说最大'Student Number'
应该是30(classroom_Max_Capacity返回值),下面的代码返回'Student Number'
超出最大值的索引。
idx = filtered_Group[filtered_Group['Student Number'] > classroom_Max_Capacity].index.tolist()
Output: [0, 1, 5, 6]
我想知道是否可以通过更改'Subject'
名称和'Student ID'
以适应最大学生人数将这些行分为两部分;例如,
Subject Student ID Student Number
0 Cit11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
1 Cit11_2 [S110, S115, S116... 15
2 EngLang11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
3 EngLang11_2 [S110, S115, S116... 15
4 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
5 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
6 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
7 His11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
8 His11_2 [S110, S115, S116... 15
9 Mat11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
10 Matt11_2 [S110, S115, S116... 15
11 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
是否可以通过不专门写修改后的'Subject'
名称来添加到数据框中来实现?
-编辑
我试图通过做类似的事情来解决问题;
filtered = filtered_Group.iloc[idx]
student_list = filtered['Student ID'].explode().str.split(', ')
subject_list = filtered['Subject']
for i in idx:
for number in range(classroom_Max_Capacity):
df.append({temp_subject_list[i]: temp_student_list[number]})
但是,这当然行不通,因此不胜感激。
答案 0 :(得分:0)
您可以使用explode
枚举学生,然后使用groupby
:
# randome data
np.random.seed(1)
df = pd.DataFrame({
'Subject': list('abcdef'),
'Student Number': [np.random.choice(np.arange(20),
np.random.randint(3,10),
replace=None)
for _ in range(6)]
})
# maximum number of students allowed
max_students = 5
# output:
(df.explode('Student Number')
.assign(section=lambda x: x.groupby('Subject')
.cumcount()//max_students + 1
)
.groupby(['Subject','section'])
['Student Number'].agg([list, 'count'])
)
输出:
list count
Subject section
a 1 [15, 10, 3, 18, 17] 5
2 [14, 16, 4] 3
b 1 [3, 2, 5, 8, 17] 5
2 [13, 10] 2
c 1 [11, 18, 2, 12, 16] 5
2 [17, 0, 4] 3
d 1 [16, 19, 11] 3
e 1 [16, 5, 4, 12, 15] 5
2 [19] 1
f 1 [18, 17, 3, 0, 1] 5
2 [9, 14, 13] 3