数据
Sentence Score_Unigram Score_Bigram versionId
0 As of Dat 5 1 269004158
1 Date Docum 4 3 269004158
2 As of Dat 4 1 269004158
3 Date Docum 5 3 345973060
4 x Indicate 4 1 372529352
5 Date Docum 5 3 372529352
6 1 Financial 9 1 372529352
7 020 per shar 2 0 372529352
8 Date $ in 8 1 372529352
9 Date $ in 9 4 372529352
10 4 --------- 4 1 372529352
11 Date Begin 1 0 372529352
必需的输出
Sentence Score_Unigram Score_Bigram versionId
0 As of Dat 5 1 269004158
3 Date Docum 5 3 345973060
9 Date $ in 9 4 372529352
客观
按版本ID分组,获取具有最大Score_unigram的行,如果结果大于一,则检查Score_Bigram列并获取具有最高值的行(如果有多个此类行,则全部返回)
我尝试了什么
maximum = 0
index_to_pick = []
for index,row_data in a.iterrows():
if row_data['Score_Unigram'] > maximum:
maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick.append(index)
elif row_data['Score_Unigram'] == maximum:
if row_data['Score_Bigram'] > score_bigram:
maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick = []
index_to_pick.append(index)
elif row_data['Score_Bigram'] == score_bigram:
index_to_pick.append(index)
a.loc[[index_to_pick[0]]]
输出
Sentence Score_Unigram Score_Bigram versionId
5 Date $ in 9 4 372529352
好吧,我猜这种方法不太好(因为数据很大),正在寻找一种有效的方法。
我尝试了idxmax
,但只返回了前一个。可能是重复的,但找不到。感谢您的帮助!!
答案 0 :(得分:2)
通过boolean indexing
使用双重过滤-首先通过第一列max
的{{1}}然后使用Score_Unigram
进行二次过滤:
Score_Bigram
答案 1 :(得分:1)
在您的df
上尝试:
df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1)
输出:
Sentence Score_Unigram Score_Bigram versionId
5 Date $ in 9 4 372529352
答案 2 :(得分:1)
我相信您无需对数据进行排序,只需将其与这两列的max
值进行比较
df[ (df['Score_Unigram'] == df['Score_Unigram'].max()) &
(df['Score_Bigram'] == df['Score_Bigram'].max()) ]