我有以下数据框:
In [1]:
import pandas as pd
pd.DataFrame({"AAA":["x1","x1","x1","x1"],
"BBB":["y1","y1","y1","y2"],
"CCC":["t1","t2","t3","t1"],
"DDD":[10,11,18,17]})
Out[1]:
AAA BBB CCC DDD
0 x1 y1 t1 10
1 x1 y1 t2 11
2 x1 y1 t3 18
3 x1 y2 t1 17
我想对"DDD"
定义的组的groupby(["AAA","BBB"])
列中的值求和。
所以:
(x1, y1, t1, 10)
,第1行(x1, y1, t2, 11)
,第2行(x1, y1, t3, 18)
是一个组。(x1, y1, t1, 10, 39)
,第1行(x1, y1, t2, 11, 39)
,第2行(x1, y1, t3, 18, 39)
我希望有一个新列,其中包含按操作分组的值。我想要以下数据框:
In [2]:
pd.DataFrame({"AAA":["x1","x1","x1","x1"],
"BBB":["y1","y1","y1","y2"],
"CCC":["t1","t2","t3","t1"],
"DDD":[10,11,18,17],
"AAA_BBB_sum":[39,39,39,17]})
Out[2]:
AAA AAA_BBB_sum BBB CCC DDD
0 x1 39 y1 t1 10
1 x1 39 y1 t2 11
2 x1 39 y1 t3 18
3 x1 17 y2 t1 17
最好怎么做?
我想到的一种方式(但我正在努力实施)是:
AAABBB
的串联,以便
他们是独一无二的AAA
和DDD
分组,所以我仍然可以选择AAABBB
列DDD
的总和AAABBB
列我确定必须有更好的方法。有什么建议吗?
答案 0 :(得分:4)
一种方法是使用:
df['AAA_BBB sum'] = df.groupby(['AAA', 'BBB'])['DDD'].transform(lambda x: x.sum())
这给出了:
AAA BBB CCC DDD AAA_BBB sum
0 x1 y1 t1 10 39
1 x1 y1 t2 11 39
2 x1 y1 t3 18 39
3 x1 y2 t1 17 17