总和产品和groupby

时间:2018-06-13 16:12:49

标签: python python-3.x pandas group-by sum

我有一个如下所示的数据框:

allHoldingsFund

      BrokerBestRate  notional_current  DistanceBestRate
0           CITI          7.859426e+05          0.023194
1           WFPBS         3.609674e+06         -0.023041
2           WFPBS         1.488828e+06         -0.023041
3           JPM           3.484168e+05         -0.106632
4           CITI          6.088499e+05          0.023194
5           WFPBS         8.665558e+06         -0.023041
6           WFPBS         4.219563e+05         -0.023041

我试图一次性完成一个总和产品和一个组(没有创建额外的和产品列)

我试过这行代码

allHoldingsFund.groupby(['BrokerBestRate'])['notional_current']*['DistanceBestRate'].sum() 

如何进行总和产品,然后使用group by?

进行汇总

期望的输出

BrokerBestRate      product of (notional_current  and DistanceBestRate)
   CITI              654654645665466
   JPM               453454534545367
  WFPBS              345345345345435

非常感谢

2 个答案:

答案 0 :(得分:3)

最简单但通常最慢的方法是使用apply

In [43]: df.groupby("BrokerBestRate").apply(lambda x: x.prod(axis=1).sum())
Out[43]: 
BrokerBestRate
CITI      32350.817245
JPM      -37152.380218
WFPBS   -326860.001568
dtype: float64

但您也可以先计算产品列 ,然后在上调用groupby

In [44]: df.eval("notional_current * DistanceBestRate").groupby(df.BrokerBestRate).sum()
Out[44]: 
BrokerBestRate
CITI      32350.817245
JPM      -37152.380218
WFPBS   -326860.001568
dtype: float64

In [45]: df[["notional_current", "DistanceBestRate"]].prod(axis=1).groupby(df["BrokerBestRate"]).sum()
Out[45]: 
BrokerBestRate
CITI      32350.817245
JPM      -37152.380218
WFPBS   -326860.001568
dtype: float64

答案 1 :(得分:2)

您可以在groupby

之前构建产品列
df.assign(col=df.notional_current*df.DistanceBestRate).groupby('BrokerBestRate',as_index=False).col.sum()
Out[372]: 
  BrokerBestRate            col
0           CITI   32350.817245
1            JPM  -37152.380218
2          WFPBS -326860.001568