我的df是这样的:
year state party percentage
1976 Arizona republican 43.34
1976 Arizona third party 0.21
1976 Arizona democrat 54.01
1976 Arizona third party 0.99
1976 Arizona third party .45
1978 Alabama third party 6.01
1978 Alabama republican 43.32
1978 Alabama third party 0.82
1978 Alabama democrat 55.06
1978 Alabama democrat 93.99
1978 Alabama third party 0.80
我想使用.groupby
,但如果一方是“第三方”,则仅求和。这是我的代码:
g = df_senate.groupby(['year','state','party'], as_index=False)
g.apply(lambda x: x[x['party'] == 'third party']['percentage'].sum())
几乎可以使用,但是其他“一方”的值却为0,并且汇总了其他各方。我想不对每年/州的其他“民主”和“共和党”行进行汇总。我只想总结“第三方”:
year state party percentage
1976 Arizona democrat 0.00
republican 0.00
third party 2.65
1978 Alabama democrat 0.00
republican 0.00
third party 7.63
我又如何将其保留为数据框?将as_index=False
放入.groupby
无效。我要结束的是:
year state party percentage
1976 Arizona republican 43.34
1976 Arizona third party 2.65
1976 Arizona democrat 54.01
1978 Alabama third party 7.63
1978 Alabama republican 43.32
1978 Alabama democrat 55.06
1978 Alabama democrat 93.99
(如果您想知道,这是参议院的选举数据,有时一个州将不得不选举2名参议员,而不仅仅是一个参议员,而且我的选民总数不必超过100%,因为那太奇怪了)
提前谢谢!
答案 0 :(得分:1)
df = (
df.groupby(['year', 'state', 'party'])['percentage'].apply(
lambda x: [x.sum()] if x.name[2] == 'third party' else list(x))
.explode()
.reset_index()
)
print(df)
打印:
year state party percentage
0 1976 Arizona democrat 54.01
1 1976 Arizona republican 43.34
2 1976 Arizona third party 1.65
3 1978 Alabama democrat 55.06
4 1978 Alabama democrat 93.99
5 1978 Alabama republican 43.32
6 1978 Alabama third party 7.63