我有一个如下表:
City_code City_name Site_code Site_capacity
AAA100 City_A Site001 300
AAA100 City_A Site002 600
AAA100 City_A Site003 500
AAA200 City_B Site004 350
AAA200 City_B Site005 250
AAA300 City_C Site006 800
AAA300 City_C Site007 150
AAA300 City_C Site008 450
AAA400 City_D Site009 300
AAA400 City_D Site0010 400
我想选择每个城市的Site_capacity值最高的站点
我尝试了以下代码:
df.groupby(['City_code', 'City_name'])['Site_capacity'].max()
此生成的输出:
City_code City_name
AAA100 City_A 600
AAA200 City_B 350
AAA300 City_C 800
AAA400 City_D 400
如何创建如下所示的输出?
City_code City_name Site_code Site_capacity
AAA100 City_A Site002 600
AAA200 City_B Site004 350
AAA300 City_C Site006 800
AAA400 City_D Site0010 400
答案 0 :(得分:3)
我们可以做sort_values
+ drop_duplicates
s = df.sort_values('Site_capacity').drop_duplicates(['City_code', 'City_name'],keep='last')
Out[334]:
City_code City_name Site_code Site_capacity
3 AAA200 City_B Site004 350
9 AAA400 City_D Site0010 400
1 AAA100 City_A Site002 600
5 AAA300 City_C Site006 800
答案 1 :(得分:1)
尝试idxmax()
和.loc
print(df.loc[df.groupby(['City_code', 'City_name'])['Site_capacity'].idxmax()])
City_code City_name Site_code Site_capacity
1 AAA100 City_A Site002 600
3 AAA200 City_B Site004 350
5 AAA300 City_C Site006 800
9 AAA400 City_D Site0010 400
答案 2 :(得分:0)
尝试一下:
df.sort_values(by=['City_name','Site_capacity'], inplace=True,ascending = (True, False))
df = df.drop_duplicates('City_name', keep='first')
print(df)
结果:
City_code City_name Site_code Site_capacity
AAA100 City_A Site002 600
AAA200 City_B Site004 350
AAA300 City_C Site006 800
AAA400 City_D Site0010 400
或者如果您想保持最低值。
df = df.drop_duplicates('City_name', keep='last')