我想先按ID
对数据进行分组,在Offer
列上执行降序排序,然后获取第二行。如何使用熊猫实现这一目标?
ID Vehicle Auction Offer
0 3580845 2005 Volvo XC90 V8 Copart 215
1 3580845 2005 Volvo XC90 V8 Manheim Salvage API 170
2 3580845 2005 Volvo XC90 V8 Merged Salvage 195
3 3580845 2005 Volvo XC90 V8 Manheim Salvage 390
4 3580845 2005 Volvo XC90 V8 IAA 270
5 3580845 2005 Volvo XC90 V8 SVP 175
6 3580789 2003 Lexus ES 300 Copart 180
7 3580789 2003 Lexus ES 300 Merged Salvage 190
8 3580789 2003 Lexus ES 300 Manheim Salvage 355
9 3580789 2003 Lexus ES 300 IAA 270
10 3580789 2003 Lexus ES 300 SVP 180
预期:
ID Vehicle Auction Offer
0 3580845 2005 Volvo XC90 V8 IAA 270
1 3580789 2003 Lexus ES 300 IAA 270
答案 0 :(得分:3)
首先需要sort_values
,然后使用cumcount
作为计数值,然后按boolean indexing
进行过滤:
df = df.sort_values(['ID','Offer'], ascending=False)
df1 = df[df.groupby('ID').cumcount() == 1]
print (df1)
ID Vehicle Auction Offer
4 3580845 2005 Volvo XC90 V8 IAA 270
9 3580789 2003 Lexus ES 300 IAA 270
答案 1 :(得分:1)
您也可以使用groupby
和rank
结合使用。
from io import StringIO
import pandas as pd
data = pd.read_table(StringIO("""ID Vehicle Auction Offer
3580845 2005VolvoXC90V8 Copart 215
3580845 2005VolvoXC90V8 ManheimSalvageAPI 170
3580845 2005VolvoXC90V8 MergedSalvage 195
3580845 2005VolvoXC90V8 ManheimSalvage 390
3580845 2005VolvoXC90V8 IAA 270
3580845 2005VolvoXC90V8 SVP 175
3580789 2003LexusES300 Copart 180
3580789 2003LexusES300 MergedSalvage 190
3580789 2003LexusES300 ManheimSalvage 355
3580789 2003LexusES300 IAA 270
3580789 2003LexusES300 SVP 180"""), sep=' ')
offer_rank_by_id = data.groupby('ID').rank(method = 'min', ascending = False).loc[:,'Offer'] == 2 # using 2 because we want to select the second highest offer for each id
data.loc[offer_rank_by_id,:]
# ID Vehicle Auction Offer
# 4 3580845 2005VolvoXC90V8 IAA 270
# 9 3580789 2003LexusES300 IAA 270