如何在Pandas中分组,排序然后获取第二高的行?

时间:2017-08-11 07:09:48

标签: python pandas sorting group-by

我想先按ID对数据进行分组,在Offer列上执行降序排序,然后获取第二行。如何使用熊猫实现这一目标?

     ID             Vehicle              Auction       Offer
0   3580845  2005 Volvo XC90 V8               Copart    215
1   3580845  2005 Volvo XC90 V8  Manheim Salvage API    170
2   3580845  2005 Volvo XC90 V8       Merged Salvage    195
3   3580845  2005 Volvo XC90 V8      Manheim Salvage    390
4   3580845  2005 Volvo XC90 V8                  IAA    270
5   3580845  2005 Volvo XC90 V8                  SVP    175
6   3580789   2003 Lexus ES 300               Copart    180
7   3580789   2003 Lexus ES 300       Merged Salvage    190
8   3580789   2003 Lexus ES 300      Manheim Salvage    355
9   3580789   2003 Lexus ES 300                  IAA    270
10  3580789   2003 Lexus ES 300                  SVP    180

预期:

     ID             Vehicle              Auction       Offer
0   3580845  2005 Volvo XC90 V8                  IAA    270
1   3580789   2003 Lexus ES 300                  IAA    270

2 个答案:

答案 0 :(得分:3)

首先需要sort_values,然后使用cumcount作为计数值,然后按boolean indexing进行过滤:

df = df.sort_values(['ID','Offer'], ascending=False)
df1 = df[df.groupby('ID').cumcount() == 1]
print (df1)
             ID        Vehicle Auction  Offer
4 3580845  2005  Volvo XC90 V8     IAA    270
9 3580789  2003   Lexus ES 300     IAA    270

答案 1 :(得分:1)

您也可以使用groupbyrank结合使用。

from io import StringIO
import pandas as pd

data = pd.read_table(StringIO("""ID Vehicle Auction Offer
3580845 2005VolvoXC90V8 Copart 215
3580845 2005VolvoXC90V8 ManheimSalvageAPI 170
3580845 2005VolvoXC90V8 MergedSalvage 195
3580845 2005VolvoXC90V8 ManheimSalvage 390
3580845 2005VolvoXC90V8 IAA 270
3580845 2005VolvoXC90V8 SVP 175
3580789 2003LexusES300 Copart 180
3580789 2003LexusES300 MergedSalvage 190
3580789 2003LexusES300 ManheimSalvage 355
3580789 2003LexusES300 IAA 270
3580789 2003LexusES300 SVP 180"""), sep=' ')

offer_rank_by_id = data.groupby('ID').rank(method = 'min', ascending = False).loc[:,'Offer'] == 2 # using 2 because we want to select the second highest offer for each id

data.loc[offer_rank_by_id,:]

#         ID          Vehicle Auction  Offer
# 4  3580845  2005VolvoXC90V8     IAA    270
# 9  3580789   2003LexusES300     IAA    270