如何在此DataFrame上利用Pandas聚合函数?

时间:2019-05-16 07:03:16

标签: python pandas dataframe

这是表格:

order_id    product_id  reordered   department_id
2           33120       1           16
2           28985       1           4
2           9327        0           13
2           45918       1           13
3           17668       1           16
3           46667       1           4
3           17461       1           12
3           32665       1           3
4           46842       0           3

我想按Department_id分组,求和来自该部门的订单数,以及该部门重新排序的订单数==0。结果表如下所示:

department_id     number_of_orders     number_of_reordered_0
3                 2                    1
4                 2                    0
12                1                    0
13                2                    1
16                2                    0

我知道这可以在SQL中完成(我忘记了对此的查询也将是什么样,如果有人可以刷新我的记忆,那也很好)。但是,熊猫的功能是什么呢?

我知道它以df.groupby('department_id')。sum()开头。不确定如何充实其余部分。

2 个答案:

答案 0 :(得分:1)

GroupBy.aggDataFrameGroupBy.size和lambda函数一起用于比较Series.eq的值和sum中的True进行计数(True1之类的过程):

df1 = (df.groupby('department_id')['reordered']
         .agg([('number_of_orders','size'), ('number_of_reordered_0',lambda x: x.eq(0).sum())])
         .reset_index())
print (df1)
   department_id  number_of_orders  number_of_reordered_0
0              3                 2                      1
1              4                 2                      0
2             12                 1                      0
3             13                 2                      1
4             16                 2                      0

如果值仅是1,并且可能使用0,请使用sum并最后减去:

df1 = (df.groupby('department_id')['reordered']
         .agg([('number_of_orders','size'), ('number_of_reordered_0','sum')])
         .reset_index())

df1['number_of_reordered_0'] = df1['number_of_orders'] - df1['number_of_reordered_0']
print (df1)
   department_id  number_of_orders  number_of_reordered_0
0              3                 2                      1
1              4                 2                      0
2             12                 1                      0
3             13                 2                      1
4             16                 2                      0

答案 1 :(得分:1)

在sql中,它将是简单的聚合

SELECT good_id, serial_number, SUM(CASE WHEN serial_number IS NULL THEN count ELSE 1 END)
FROM invoice
GROUP BY good_id, serial_number