如何计算熊猫的全能

时间:2018-10-22 15:33:47

标签: python pandas dataframe

我有一个像这样的数据框

2001

如您所见,学生1和3在特定科目中都取得了很高的分数,但他们的总成绩却很差,而学生2在任何科目中均未获得最高分,但总体得分最高

overallScore = subject111Mark * subject111Weight + subject222Mark * subject222Weight

所以我想看看某个学生是否是“全能学生”,这意味着我想查看该学生的总成绩是否最高,但是在任何学科中都没有最高分数。如果满足此条件,则将该学生标记为“全才”

和df应该看起来像这样:

studentID subjectID subjectMark subjectWeight  Rank   overallScore 

 1         111         100         0.4           3      40      
 1         222         0           0.6           3      40   
 2         111         90          0.4           1      90      
 2         222         90          0.6           1      90     
 3         111         0           0.4           2      60      
 3         222         100         0.6           2      60       

我有一个后续问题
给出的答案可以解决最后一个数据帧的问题,但是如果我想对以下数据帧中的每个类都做到这一点呢?

studentID subjectID subjectMark subjectWeight  Rank   overallScore AR

 1         111         100         0.4           3      40         F
 1         222         0           0.6           3      40         F
 2         111         90          0.4           1      90         T
 2         222         90          0.6           1      90         T
 3         111         0           0.4           2      60         F
 3         222         100         0.6           2      60         F

2 个答案:

答案 0 :(得分:2)

您可以检查

s1=df.groupby('subjectID').subjectMark.transform('max').eq(df.subjectMark)# check the max score with each student 
s2=df.overallScore.eq(df.overallScore.max())# get the max score of overall
s2&((~s1).groupby(df['studentID']).transform('all'))# get the above conditions and both met should return True
Out[1066]: 
0    False
1    False
2     True
3     True
4    False
5    False
dtype: bool

答案 1 :(得分:0)

list_of_all_rounder_per_class = []

for classid in data['classID'].unique():
    that_class = data.loc[data.classID == classID]
    condition1 = that_class.groupby(['subjectID']).subjectMark.transform('max').eq(that_class.subjectMark) 
    condition2 = that_class.overallScore.eq(that_class. overallScore.max()) 
    # get the above conditions and both met should return True
    list_of_all_rounder_per_class.append(condition2 &((~condition1).groupby(that_class['studentID']).transform('all')))

total_result = [result_for_each_class.to_frame('all_rounder') for result_for_each_class in list_of_all_rounder_per_class]
all_rounder = pd.concat(total_result)

data = data.join(all_rounder, how='outer')

我想出了一种解决方法,即使这可能是实现目标的最佳(最简洁)方式