我有一个如下数据框,
year state candidate candidatevotes
0 1976 Alabama Carter, Jimmy 659170
1 1976 Alabama Ford, Gerald 504070
7 1976 Alaska Ford, Gerald 71555
8 1976 Alaska Carter, Jimmy 44058
11 1976 Arizona Ford, Gerald 418642
在这里,我需要根据每个州的获胜者添加一个二进制列。例如,阿拉巴马州的优胜者是吉米·卡特。因此,输出应如下所示。
year state candidate candidatevotes winner
0 1976 Alabama Carter, Jimmy 659170 1
1 1976 Alabama Ford, Gerald 504070 0
7 1976 Alaska Ford, Gerald 71555 1
8 1976 Alaska Carter, Jimmy 44058 0
11 1976 Arizona Ford, Gerald 418642 1
上述操作最有效的方法是什么?
答案 0 :(得分:2)
我们通常会进行transform
,您可以在将bool转换为int的末尾添加astype(int)
s=df.groupby(['year','state']).candidatevotes.transform('max')
df['winner']=df.candidatevotes==s
df
Out[40]:
year state candidate candidatevotes winner
0 1976 Alabama Carter,Jimmy 659170 True
1 1976 Alabama Ford,Gerald 504070 False
7 1976 Alaska Ford,Gerald 71555 True
8 1976 Alaska Carter,Jimmy 44058 False
11 1976 Arizona Ford,Gerald 418642 True