从熊猫中的不规则时间序列生成常规时间序列

时间:2016-01-03 19:44:26

标签: python numpy pandas time-series data-analysis

我有一个数据分析任务,我想分析实时服务日志。你能帮我解决一下如何在熊猫中做到这一点吗?

我的初始数据框如下所示: enter image description here

我想为每个服务名称生成时间序列,并根据此进行相关性分析。

如何通过汇总各自的数据,将此数据帧划分为不同的数据帧(以时隙索引),如下所示? enter image description here

Ps:我见过类似的问题,但我相信我的问题不同,因为我想从数据框生成许多时间序列。如果这很容易,请提前抱歉,我是Pandas的新手:)

这是我的Dataframe代码:

                ERRORCODE   ERRORTEXT   SERVICENAME REQTDURATION    RESPTDURATION   HOSTDURATION

10:00:27:000        NaN        NaN      serviceA       0              1             4612    
10:00:27:822        NaN        NaN      serviceB       0              1             14994   
10:01:27:622        -1        'Timeout' serviceA       1              0             7695    
10:01:27:323        NaN        NaN      serviceD       0              1             2612
10:01:27:755        NaN        NaN      serviceA       0              1             1612
10:02:27:666        -5        'Timeout' serviceA       0              1             11612
10:02:27:111        NaN        NaN      serviceB       0              1             111112
10:02:27:333        NaN        NaN      serviceC       0              1             412

1 个答案:

答案 0 :(得分:2)

从:

开始
                 ERRORCODE  ERRORTEXT SERVICENAME  REQTDURATION  RESPTDURATION  \
10:00:27:000        NaN        NaN    serviceA             0              1   
10:00:27:822        NaN        NaN    serviceB             0              1   
10:01:27:622         -1  'Timeout'    serviceA             1              0   
10:01:27:323        NaN        NaN    serviceD             0              1   
10:01:27:755        NaN        NaN    serviceA             0              1   
10:02:27:666         -5  'Timeout'    serviceA             0              1   
10:02:27:111        NaN        NaN    serviceB             0              1   
10:02:27:333        NaN        NaN    serviceC             0              1   

              HOSTDURATION  
10:00:27:000          4612  
10:00:27:822         14994  
10:01:27:622          7695  
10:01:27:323          2612  
10:01:27:755          1612  
10:02:27:666         11612  
10:02:27:111        111112  
10:02:27:333           412 

index转换为DateTimeIndex

df.index = pd.to_datetime(df.index, format='%H:%M:%S:%f')

然后循环遍历SERVICENAME组:

for service, data in df.groupby('SERVICENAME'):
    service_result = pd.concat([data.groupby(pd.TimeGrouper('Min')).size(), data.groupby(pd.TimeGrouper('Min'))['REQTDURATION', 'RESPTDURATION', 'HOSTDURATION'].mean()], axis=1)
    service_result.columns = ['ERRORCOUNT', 'AVGREQTURATION', 'AVGRESPTDURATION', 'AVGHOSTDURATION']
    service_result.index = service_result.index.time

的产率:

serviceA

          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:00:00           1             0.0               1.0           4612.0
10:01:00           2             0.5               0.5           4653.5
10:02:00           1             0.0               1.0          11612.0

 serviceB
          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:00:00           1               0                 1            14994
10:01:00           0             NaN               NaN              NaN
10:02:00           1               0                 1           111112

 serviceC
          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:02:00           1               0                 1              412

 serviceD
          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:01:00           1               0                 1             2612