这是我的数据框:
df =
GROUP GRADE TOTAL_SERVICE_TIME TOTAL_WAIT_TIME
AAA 1 45 20
AAA 4 40 23
AAA 5 35 21
BBB 2 30 24
BBB 3 55 22
我希望按GROU
和GRADE
对条目进行分组,为每个分组估算平均TOTAL_SERVICE_TIME
和平均TOTAL_WAIT_TIME
,并计算属于每个分组的条目数基。
我不知道如何进行计数:
output = df.groupby(['GROUP','GRADE'])
.agg({'TOTAL_SERVICE_TIME' : 'mean', 'TOTAL_WAIT_TIME' : 'mean'})
.value_counts()
.reset_index()
我还尝试添加, 'COUNT' : 'count'
,但列COUNT
应该已经存在。
答案 0 :(得分:2)
您很接近,agg
上的文档很简单:
df.groupby(['GROUP','GRADE']).agg({'TOTAL_SERVICE_TIME' : 'mean',
'TOTAL_WAIT_TIME' : ['mean', 'count']})
Out[45]:
TOTAL_WAIT_TIME TOTAL_SERVICE_TIME
mean count mean
GROUP GRADE
AAA 1 20 1 45
4 23 1 40
5 21 1 35
BBB 2 24 1 30
3 22 1 55
答案 1 :(得分:1)
我想将this great @Boud's answer扩展为另一个示例,您可以在其中提供自定义列名称:
In [57]: funcs = {
...: 'TOTAL_SERVICE_TIME': {'mean_service':'mean', 'count_service':'size'},
...: 'TOTAL_WAIT_TIME' : {'mean_wait':'mean', 'count_wait':'size'}
...: }
...:
In [58]: df
Out[58]:
GROUP GRADE TOTAL_SERVICE_TIME TOTAL_WAIT_TIME
0 AAA 1 45 20
1 AAA 1 100 100
2 AAA 4 40 23
3 AAA 5 35 21
4 BBB 2 30 24
5 BBB 3 55 22
In [59]: df.groupby(['GROUP','GRADE']).agg(funcs)
Out[59]:
TOTAL_SERVICE_TIME TOTAL_WAIT_TIME
mean_service count_service count_wait mean_wait
GROUP GRADE
AAA 1 72.5 2 2 60
4 40.0 1 1 23
5 35.0 1 1 21
BBB 2 30.0 1 1 24
3 55.0 1 1 22
现在您可以删除列级别:
x = df.groupby(['GROUP','GRADE']).agg(funcs)
x.columns = x.columns.droplevel(0)
In [63]: x
Out[63]:
mean_service count_service count_wait mean_wait
GROUP GRADE
AAA 1 72.5 2 2 60
4 40.0 1 1 23
5 35.0 1 1 21
BBB 2 30.0 1 1 24
3 55.0 1 1 22
In [64]: x.reset_index()
Out[64]:
GROUP GRADE mean_service count_service count_wait mean_wait
0 AAA 1 72.5 2 2 60
1 AAA 4 40.0 1 1 23
2 AAA 5 35.0 1 1 21
3 BBB 2 30.0 1 1 24
4 BBB 3 55.0 1 1 22