我有这个数据框
df1
name triggerid description time
srvjboss03 30708 Access URL A failed 01:19:23
srvjboss03 30708 Access URL A failed 01:18:21
srvglass01 32942 Service Glassfish OFFLINE 00:35:00
srvglass01 32942 Service Glassfish OFFLINE 00:35:00
srvglass01 22725 Access URL B failed 00:36:04
srvglass01 22725 Access URL B failed 00:36:07
srvglass01 22725 Access URL B failed 00:06:04
srvglass01 22725 Access URL B failed 00:06:04
欲望输出是:
name triggerid description time
srvjboss03 30708 Access URL A failed 02:31:44
srvglass01 32942 Service Glassfish OFFLINE 01:10:00
srvglass01 22725 Access URL B failed 01:24:21
时间是具有相同名称,触发器和描述的行的总和o次。
我尝试将列名称,触发器和描述设置为索引然后组合,但我得到了这个。
df1.set_index(['name', 'triggerid', 'descrption'], inplace=True)
df1.groupby(df.index)['time'].sum()
name triggerid description time
srvjboss03 30708 Access URL A failed 01:19:23
Access URL A failed 01:18:21
srvglass01 32942 Service Glassfish OFFLINE 00:35:00
Service Glassfish OFFLINE 00:35:00
srvglass01 22725 Access URL B failed 00:36:04
Access URL B failed 00:36:07
Access URL B failed 00:06:04
Access URL B failed 00:06:04
列时间设置为timedelta64。 为什么pandas不会以同样的方式对name和triggerid进行分组描述? 如何获得所需的输出?
答案 0 :(得分:4)
让我们试一试。首先将时间列转换为timedelta。
df['time'] = pd.to_timedelta(df['time'])
df.groupby(['name','triggerid','description'])['time'].sum()\
.reset_index()
输出:
name triggerid description time
0 srvglass01 22725 Access URL B failed 01:24:19
1 srvglass01 32942 Service Glassfish OFFLINE 01:10:00
2 srvjboss03 30708 Access URL A failed 02:37:44
其他替代方案:
df2 = df.set_index(['name','triggerid','description'])
df2.groupby(df2.index)['time'].sum()
输出:
(srvglass01, 22725, Access URL B failed) 01:24:19
(srvglass01, 32942, Service Glassfish OFFLINE) 01:10:00
(srvjboss03, 30708, Access URL A failed) 02:37:44
Name: time, dtype: timedelta64[ns]
或者
df2.groupby(level=[0,1,2])['time'].sum()
输出:
name triggerid description
srvglass01 22725 Access URL B failed 01:24:19
32942 Service Glassfish OFFLINE 01:10:00
srvjboss03 30708 Access URL A failed 02:37:44
Name: time, dtype: timedelta64[ns]