我有一个像这样的数据框
2001
如您所见,学生1和3在特定科目中都取得了很高的分数,但他们的总成绩却很差,而学生2在任何科目中均未获得最高分,但总体得分最高
overallScore = subject111Mark * subject111Weight + subject222Mark * subject222Weight
所以我想看看某个学生是否是“全能学生”,这意味着我想查看该学生的总成绩是否最高,但是在任何学科中都没有最高分数。如果满足此条件,则将该学生标记为“全才”
和df应该看起来像这样:
studentID subjectID subjectMark subjectWeight Rank overallScore
1 111 100 0.4 3 40
1 222 0 0.6 3 40
2 111 90 0.4 1 90
2 222 90 0.6 1 90
3 111 0 0.4 2 60
3 222 100 0.6 2 60
我有一个后续问题
给出的答案可以解决最后一个数据帧的问题,但是如果我想对以下数据帧中的每个类都做到这一点呢?
studentID subjectID subjectMark subjectWeight Rank overallScore AR
1 111 100 0.4 3 40 F
1 222 0 0.6 3 40 F
2 111 90 0.4 1 90 T
2 222 90 0.6 1 90 T
3 111 0 0.4 2 60 F
3 222 100 0.6 2 60 F
答案 0 :(得分:2)
您可以检查
s1=df.groupby('subjectID').subjectMark.transform('max').eq(df.subjectMark)# check the max score with each student
s2=df.overallScore.eq(df.overallScore.max())# get the max score of overall
s2&((~s1).groupby(df['studentID']).transform('all'))# get the above conditions and both met should return True
Out[1066]:
0 False
1 False
2 True
3 True
4 False
5 False
dtype: bool
答案 1 :(得分:0)
list_of_all_rounder_per_class = []
for classid in data['classID'].unique():
that_class = data.loc[data.classID == classID]
condition1 = that_class.groupby(['subjectID']).subjectMark.transform('max').eq(that_class.subjectMark)
condition2 = that_class.overallScore.eq(that_class. overallScore.max())
# get the above conditions and both met should return True
list_of_all_rounder_per_class.append(condition2 &((~condition1).groupby(that_class['studentID']).transform('all')))
total_result = [result_for_each_class.to_frame('all_rounder') for result_for_each_class in list_of_all_rounder_per_class]
all_rounder = pd.concat(total_result)
data = data.join(all_rounder, how='outer')
我想出了一种解决方法,即使这可能是实现目标的最佳(最简洁)方式