Question

我的数据以10秒为间隔组织24小时：

2015-10-14 15:01:10 3956.58 0   19  6.21    105.99  42  59.24  
2015-10-14 15:01:20 3956.58 0   1   0.81    121.57  42  59.24  
2015-10-14 15:01:30 3956.58 0   47  8.29    115.53  42  59.24  
2015-10-14 15:01:40 3956.58 0   79  12.19   107.64  42  59.24 
..   
..   
..     
2015-10-15 13:01:10     3956.58 0   79  8.02    107.64  42  59.24   
2015-10-15 13:01:10     3956.58 0   79  7.95    108.98  42  59.24
2015-10-15 13:01:10     3956.58 0   79  7.07    110.58  42  59.24

我想检查，对于任何每小时组，是否有超过10秒的间隔。如何获得每个组的间隙并打印出来？到目前为止，我有以下几点：

df = pd.read_csv('convertcsv.csv', parse_dates = True, index_col=0,
                 names=['date', 'hole_depth', 'rop', 'rotary',
                        'torque', 'hook_load', 'azimuth', 'inclin'])
df['num_gaps'] = df.groupby(df.index.date)
df.groupby(df.index.time)['num_gaps'].sum()

我希望输出为：

timestamp, num_of_gaps  
2015-10-15 06:00, 5  
2015-10-15 07:00, 0   
...

Answer 1

This is a great answer to get you started.您的情况有所不同，因为您希望先按小时分组，然后查找大于10秒的差异（避免答案中提到的日期差异问题）。

所以你可以尝试，假设DataFrame附带DateTimeIndex：

import pandas as pd
df['tvalue'] = df.index
time_groups = df.groupby(pd.TimeGrouper('H'))
for hour, data in time_groups:
    data['delta'] = (data['tvalue']-data['tvalue'].shift()).fillna(0)
    data['delta_sec'] = data['delta'].apply(lambda x: x  / np.timedelta64(10,'s'))
    print(data[data.delta_sec > 10])

刚看到你的编辑 - 你当然也可以只计算每小时的值，并检查.count()是否低于预期的360。换句话说，

print(df.groupby(TimeGrouper('H')).size())

在10秒间隔数据中查找每小时的间隙数

1 个答案: