使用groupby()。sum()结果来操作原始数据帧

时间:2015-05-21 19:20:00

标签: pandas

我有一个像交易一样的数据框

    branch      daqu  from    to       style  color  size  amount
5  huadong  shanghai  C30C  C30F  EEBW52301M     39   165       3
8  huadong  shanghai  C30F  C306  EEBW52301M     51   160       2
2  huadong  shanghai  C30G  C306  EEBW52301M     39   165      10
9  huadong  shanghai  C30G  C30C  EEBW52301M     51   170       1
1  huadong  shanghai  C30G  C30F  EEBW52301M     39   160       7
7  huadong  shanghai  C30J  C30D  EEBW52301M     39   170       2
6  huadong  shanghai  C30J  C30F  EEBW52301M     39   170       4
3  huadong  shanghai  C30K  C306  EEBW52301M     39   165       1
0  huadong  shanghai  C30K  C30F  EEBW52301M     39   160       7
4  huadong  shanghai  C30K  C30F  EEBW52301M     39   165       6

数据意味着我们必须从'从'商店向商店发送'金额'的款式/颜色/尺寸产品。

然后我所做的就是“从'和'到''组合,所以我可以看到每个盒子里放了多少产品。

print dh_final[['from', 'to', 'amount']].groupby(['from', 'to']).sum()

            amount
from to          
C30C C30F       3
C30F C306       2
C30G C306      10
     C30C       1
     C30F       7
C30J C30D       2
     C30F       4
C30K C306       1
     C30F      13

最后,如果从一个商店到另一个商店的商品少于5个产品,我想取消与该商品相关的交易。那就是我必须删除原始数据帧中的行。如果我手动完成,结果应该是这样的。

    branch      daqu  from    to       style  color  size  amount
2  huadong  shanghai  C30G  C306  EEBW52301M     39   165      10
1  huadong  shanghai  C30G  C30F  EEBW52301M     39   160       7
0  huadong  shanghai  C30K  C30F  EEBW52301M     39   160       7
4  huadong  shanghai  C30K  C30F  EEBW52301M     39   165       6

有没有简单的方法可以做到这一点?如何使用groupby()。sum()的结果来操作原始数据帧?

1 个答案:

答案 0 :(得分:1)

如果我理解你的话你想要这个:

In [53]:
df['sum'] = df.groupby(['from', 'to'])['amount'].transform('sum')
df[df['sum'] > 5]

Out[53]:
    branch      daqu  from    to       style  color  size  amount  sum
2  huadong  shanghai  C30G  C306  EEBW52301M     39   165      10   10
1  huadong  shanghai  C30G  C30F  EEBW52301M     39   160       7    7
0  huadong  shanghai  C30K  C30F  EEBW52301M     39   160       7   13
4  huadong  shanghai  C30K  C30F  EEBW52301M     39   165       6   13

所以我在transform对象上调用groupby来返回与原始df对齐的系列,以添加' sum'我可以像往常一样过滤df。

修改

实际上我认为你可以做到这一点:

In [67]:
df[df.groupby(['from', 'to'])['amount'].transform('sum') > 5]

Out[67]:
    branch      daqu  from    to       style  color  size  amount
2  huadong  shanghai  C30G  C306  EEBW52301M     39   165      10
1  huadong  shanghai  C30G  C30F  EEBW52301M     39   160       7
0  huadong  shanghai  C30K  C30F  EEBW52301M     39   160       7
4  huadong  shanghai  C30K  C30F  EEBW52301M     39   165       6