我有一个如下所示的数据框:
allHoldingsFund
BrokerBestRate notional_current DistanceBestRate
0 CITI 7.859426e+05 0.023194
1 WFPBS 3.609674e+06 -0.023041
2 WFPBS 1.488828e+06 -0.023041
3 JPM 3.484168e+05 -0.106632
4 CITI 6.088499e+05 0.023194
5 WFPBS 8.665558e+06 -0.023041
6 WFPBS 4.219563e+05 -0.023041
我试图一次性完成一个总和产品和一个组(没有创建额外的和产品列)
我试过这行代码
allHoldingsFund.groupby(['BrokerBestRate'])['notional_current']*['DistanceBestRate'].sum()
如何进行总和产品,然后使用group by?
进行汇总期望的输出
BrokerBestRate product of (notional_current and DistanceBestRate)
CITI 654654645665466
JPM 453454534545367
WFPBS 345345345345435
非常感谢
答案 0 :(得分:3)
最简单但通常最慢的方法是使用apply
:
In [43]: df.groupby("BrokerBestRate").apply(lambda x: x.prod(axis=1).sum())
Out[43]:
BrokerBestRate
CITI 32350.817245
JPM -37152.380218
WFPBS -326860.001568
dtype: float64
但您也可以先计算产品列 ,然后在上调用groupby :
In [44]: df.eval("notional_current * DistanceBestRate").groupby(df.BrokerBestRate).sum()
Out[44]:
BrokerBestRate
CITI 32350.817245
JPM -37152.380218
WFPBS -326860.001568
dtype: float64
In [45]: df[["notional_current", "DistanceBestRate"]].prod(axis=1).groupby(df["BrokerBestRate"]).sum()
Out[45]:
BrokerBestRate
CITI 32350.817245
JPM -37152.380218
WFPBS -326860.001568
dtype: float64
答案 1 :(得分:2)
您可以在groupby
df.assign(col=df.notional_current*df.DistanceBestRate).groupby('BrokerBestRate',as_index=False).col.sum()
Out[372]:
BrokerBestRate col
0 CITI 32350.817245
1 JPM -37152.380218
2 WFPBS -326860.001568