pandas总和基于多个组的行

时间:2017-11-03 18:00:53

标签: python pandas pandas-groupby

我有这个数据框

df1

name         triggerid description                      time                            
srvjboss03   30708     Access URL A failed              01:19:23
srvjboss03   30708     Access URL A failed              01:18:21
srvglass01   32942     Service Glassfish OFFLINE        00:35:00
srvglass01   32942     Service Glassfish OFFLINE        00:35:00
srvglass01   22725     Access URL B failed              00:36:04
srvglass01   22725     Access URL B failed              00:36:07
srvglass01   22725     Access URL B failed              00:06:04
srvglass01   22725     Access URL B failed              00:06:04

欲望输出是:

name         triggerid description                      time                            
srvjboss03   30708     Access URL A failed              02:31:44
srvglass01   32942     Service Glassfish OFFLINE        01:10:00
srvglass01   22725     Access URL B failed              01:24:21

时间是具有相同名称,触发器和描述的行的总和o次。

我尝试将列名称,触发器和描述设置为索引然后组合,但我得到了这个。

df1.set_index(['name', 'triggerid', 'descrption'], inplace=True)

df1.groupby(df.index)['time'].sum()


name         triggerid description                      time
srvjboss03   30708     Access URL A failed              01:19:23
                       Access URL A failed              01:18:21
srvglass01   32942     Service Glassfish OFFLINE        00:35:00
                       Service Glassfish OFFLINE        00:35:00
srvglass01   22725     Access URL B failed              00:36:04
                       Access URL B failed              00:36:07
                       Access URL B failed              00:06:04
                       Access URL B failed              00:06:04

列时间设置为timedelta64。 为什么pandas不会以同样的方式对name和triggerid进行分组描述? 如何获得所需的输出?

1 个答案:

答案 0 :(得分:4)

让我们试一试。首先将时间列转换为timedelta。

df['time'] = pd.to_timedelta(df['time'])

df.groupby(['name','triggerid','description'])['time'].sum()\
  .reset_index()

输出:

         name  triggerid                description     time
0  srvglass01      22725        Access URL B failed 01:24:19
1  srvglass01      32942  Service Glassfish OFFLINE 01:10:00
2  srvjboss03      30708        Access URL A failed 02:37:44

其他替代方案:

df2 = df.set_index(['name','triggerid','description'])
df2.groupby(df2.index)['time'].sum()

输出:

(srvglass01, 22725, Access URL B failed)         01:24:19
(srvglass01, 32942, Service Glassfish OFFLINE)   01:10:00
(srvjboss03, 30708, Access URL A failed)         02:37:44
Name: time, dtype: timedelta64[ns]

或者

df2.groupby(level=[0,1,2])['time'].sum()

输出:

name        triggerid  description              
srvglass01  22725      Access URL B failed         01:24:19
            32942      Service Glassfish OFFLINE   01:10:00
srvjboss03  30708      Access URL A failed         02:37:44
Name: time, dtype: timedelta64[ns]