我正在寻找一种熊猫表达方式,该方式可以找到两个候选人之间百分比的胜利幅度,方法是找到候选人的选票数量更多,找到他们所占县选票的百分比然后减去减去的百分比在两位候选人的投票总数百分比中找到这样的胜利余地,而同时又无视第三方候选人并以此为由。
YEAR STATE County CANDIDATE VOTES
2016 Ohio Medina County Donald Trump 184211
2016 Ohio Medina County Hillary Clinton 398271
2016 Ohio Medina County Gary Johnson 12993
2016 Ohio Cuyahoga County Donald Trump 54810
2016 Ohio Cuyahoga County Hillary Clinton 32182
2016 Ohio Cuyahoga County Gary Johnson 2975
..对此
YEAR STATE County CANDIDATE VOTES MARGIN OF VICTORY
2016 Ohio Medina County Donald Trump 184211 Hillary Clinton +35.1%
2016 Ohio Medina County Hillary Clinton 398271 Hillary Clinton +35.1%
2016 Ohio Medina County Gary Johnson 12993 Hillary Clinton +35.1%
2016 Ohio Cuyahoga County Donald Trump 54810 Doanld Trump +24.6%
2016 Ohio Cuyahoga County Hillary Clinton 32182 Donald Trump +24.6%
2016 Ohio Cuyahoga County Gary Johnson 2975 Doanld Trump +24.6%
答案 0 :(得分:0)
不确定您打算如何计算利润率。下面是 概述您不能采取正确答案的方法
您首先需要在YEAR STATE COUNTY
级汇总数据
df_agg = df.groupby(by=['YEAR','STATE','COUNTY'])['VOTES'].sum().reset_index().rename_column({'VOTES':'AGG_VOTES'})
您可以将此df重新加入到原始df
并使用AGG_VOTES
列来生成所需的统计信息
df = pd.merge(df,df_agg,on=['YEAR','STATE','COUNTY'])
df['Candidate_Percentage'] = df['Votes'] * 100 / df['AGG_VOTES']
您可以进一步汇总df
来汇总它,以得到具有所需保证金胜利的合并后的df
答案 1 :(得分:0)
这是一个非常尴尬的解决方案,但是它可以工作。函数do_county
一次处理一个县。
def do_county(data):
return (data.set_index('CANDIDATE').sort_values('VOTES') \
/ data['VOTES'].sum())\ # Normalize
.diff().tail(1) # Take the diff between top two
df.groupby(['County', 'STATE', 'YEAR']).apply(do_county)
# VOTES
#County STATE YEAR CANDIDATE
#Cuyahoga County Ohio 2016 Donald Trump 0.251514
#Medina County Ohio 2016 Hillary Clinton 0.359478
我确信有更好的方法来解决问题。