如何计算属于每个组的条目数?

时间:2017-01-31 15:40:51

标签: python pandas

这是我的数据框:

df = 

GROUP    GRADE   TOTAL_SERVICE_TIME    TOTAL_WAIT_TIME
AAA      1       45                    20
AAA      4       40                    23
AAA      5       35                    21
BBB      2       30                    24
BBB      3       55                    22

我希望按GROUGRADE对条目进行分组,为每个分组估算平均TOTAL_SERVICE_TIME和平均TOTAL_WAIT_TIME,并计算属于每个分组的条目数基。

我不知道如何进行计数:

output = df.groupby(['GROUP','GRADE'])
           .agg({'TOTAL_SERVICE_TIME' : 'mean', 'TOTAL_WAIT_TIME' : 'mean'})
           .value_counts()
           .reset_index()

我还尝试添加, 'COUNT' : 'count',但列COUNT应该已经存在。

2 个答案:

答案 0 :(得分:2)

您很接近,agg上的文档很简单:

df.groupby(['GROUP','GRADE']).agg({'TOTAL_SERVICE_TIME' : 'mean',
                                   'TOTAL_WAIT_TIME' : ['mean', 'count']})
Out[45]: 
            TOTAL_WAIT_TIME       TOTAL_SERVICE_TIME
                       mean count               mean
GROUP GRADE                                         
AAA   1                  20     1                 45
      4                  23     1                 40
      5                  21     1                 35
BBB   2                  24     1                 30
      3                  22     1                 55

答案 1 :(得分:1)

我想将this great @Boud's answer扩展为另一个示例,您可以在其中提供自定义列名称:

In [57]: funcs = {
    ...:   'TOTAL_SERVICE_TIME': {'mean_service':'mean', 'count_service':'size'},
    ...:   'TOTAL_WAIT_TIME' : {'mean_wait':'mean', 'count_wait':'size'}
    ...: }
    ...:

In [58]: df
Out[58]:
  GROUP  GRADE  TOTAL_SERVICE_TIME  TOTAL_WAIT_TIME
0   AAA      1                  45               20
1   AAA      1                 100              100
2   AAA      4                  40               23
3   AAA      5                  35               21
4   BBB      2                  30               24
5   BBB      3                  55               22

In [59]: df.groupby(['GROUP','GRADE']).agg(funcs)
Out[59]:
            TOTAL_SERVICE_TIME               TOTAL_WAIT_TIME
                  mean_service count_service      count_wait mean_wait
GROUP GRADE
AAA   1                   72.5             2               2        60
      4                   40.0             1               1        23
      5                   35.0             1               1        21
BBB   2                   30.0             1               1        24
      3                   55.0             1               1        22

现在您可以删除列级别:

x = df.groupby(['GROUP','GRADE']).agg(funcs)
x.columns = x.columns.droplevel(0)


In [63]: x
Out[63]:
             mean_service  count_service  count_wait  mean_wait
GROUP GRADE
AAA   1              72.5              2           2         60
      4              40.0              1           1         23
      5              35.0              1           1         21
BBB   2              30.0              1           1         24
      3              55.0              1           1         22

In [64]: x.reset_index()
Out[64]:
  GROUP  GRADE  mean_service  count_service  count_wait  mean_wait
0   AAA      1          72.5              2           2         60
1   AAA      4          40.0              1           1         23
2   AAA      5          35.0              1           1         21
3   BBB      2          30.0              1           1         24
4   BBB      3          55.0              1           1         22