提取特定时间段内时间戳列中的最大时间间隔

时间:2019-09-09 13:27:46

标签: python pandas dataframe

我相信我的问题确实很简单,而且必须有一种非常简单的方法来解决此问题,但是由于我对Python相当陌生,尤其是熊猫,所以我无法自己解决它。

我组成了以下数据框,它代表了我正在研究的一种更简单的方案。我正在寻找一种方法,可以收集每10分钟每个索引之间的最大时间戳间隔。我正在设计一个过滤器,因此我想消除它以可视化方式显示每10分钟的最大时差。

            Timestamp      Category  ...       Class           Speed
0     2013-08-14 22:00:00         1  ...          1               1
1     2013-08-14 22:00:01         1  ...          2               1
2     2013-08-14 22:00:05         1  ...          0               1.1
3     2013-08-14 22:00:07         1  ...          1               1.2
4     2013-08-14 22:00:14         1  ...          3               1.2
5     2013-08-14 22:00:15         1  ...          0               1.2
6     2013-08-14 22:00:16         1  ...          1               1.2
7     2013-08-14 22:00:27         1  ...          2               1.2
8     2013-08-14 22:00:38         1  ...          1               1.2

3000  2013-08-23 22:59:59         0  ...          1               2.3

我期望的结果类似于以下内容:

     Timestamp       Max time gap                                            
2013-08-14 22:00:00    13.416600 
2013-08-14 22:10:00    14.088200    
2013-08-14 22:20:00    7.187153    
2013-08-14 22:30:00    16.444224      
2013-08-14 22:40:00    11.780500        
2013-08-14 22:50:00    12.051639        

希望我能做到简洁明了。非常感谢您在此方面的帮助!

3 个答案:

答案 0 :(得分:1)

如果每10分钟数据需要最大差异:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df = (df.resample('10Min', on='Timestamp')['Timestamp']
        .apply(lambda x: x.diff().dt.total_seconds().max())
        .reset_index(name='Max time gap'))

print (df)
               Timestamp  Max time gap
0    2013-08-14 22:00:00          11.0
1    2013-08-14 22:10:00           NaN
2    2013-08-14 22:20:00           NaN
3    2013-08-14 22:30:00           NaN
4    2013-08-14 22:40:00           NaN
                 ...           ...
1297 2013-08-23 22:10:00           NaN
1298 2013-08-23 22:20:00           NaN
1299 2013-08-23 22:30:00           NaN
1300 2013-08-23 22:40:00           NaN
1301 2013-08-23 22:50:00           NaN

[1302 rows x 2 columns]

测试

df['new'] = df.resample('10Min', on='Timestamp')['Timestamp'].diff()
print (df)
               Timestamp  Category  Class  Speed      new
0    2013-08-14 22:00:00         1      1    1.0      NaT
1    2013-08-14 22:00:01         1      2    1.0 00:00:01
2    2013-08-14 22:00:05         1      0    1.1 00:00:04
3    2013-08-14 22:00:07         1      1    1.2 00:00:02
4    2013-08-14 22:00:14         1      3    1.2 00:00:07
5    2013-08-14 22:00:15         1      0    1.2 00:00:01
6    2013-08-14 22:00:16         1      1    1.2 00:00:01
7    2013-08-14 22:00:27         1      2    1.2 00:00:11
8    2013-08-14 22:00:38         1      1    1.2 00:00:11
3000 2013-08-23 22:59:59         0      1    2.3      NaT

答案 1 :(得分:1)

您可以每10分钟resample进行数据分析,并应用汇总函数来查找最大时差:

df.set_index(df.Timestamp.astype('datetime64'), inplace=True)
df['Timestamp'] = df['Timestamp'].astype('datetime64')

df['Timestamp'].resample('10m').agg(lambda x: np.max(x) - np.min(x))

答案 2 :(得分:1)

输入数据集:

数字,时间戳,类别,类,速度
0,2013-08-14 22:00:00,1,1,1
1,2013-08-14 22:00:01,1,2,1
2,2013-08-14 22:00:05,1,0,1.1
3,2013-08-14 22:00:07,1,1,1.2
4,2013-08-14 22:00:14,1,3,1.2
5,2013-08-14 22:00:15,1,0,1.2
6,2013-08-14 22:00:16,1,1,1.2
7,2013-08-14 22:00:27,1,2,1.2
8,2013-08-14 22:00:38,1,1,1.2
8,2013-08-14 22:40:38,1,1,1.2
8,2013-08-14 22:45:38,1,1,1.2
8,2013-08-14 22:49:38,1,1,1.2
8,2013-08-14 22:50:38,1,1,1.2
8,2013-08-14 22:52:38,1,1,1.2
3000,2013-08-23 22:59:59,0,1,1
流程:

import pandas as pd
dataset = pd.read_csv('dataset.csv')  
dataset = pd.DataFrame(dataset)  
timestampField = pd.to_datetime(dataset['Timestamp'])  
startDate = pd.to_datetime('2013-08-14 22:00:00')  
episode = pd.Timedelta('10 minutes')  
maxInterval = pd.Timedelta('0 second')  
for index in range(1, len(timestampField)):  
  if timestampField[index] >= startDate + episode:  
      print(startDate, maxInterval.total_seconds())  
      startDate = startDate + episode  
      while timestampField[index] > startDate + episode:     
          startDate = startDate + episode
      maxInterval = pd.Timedelta('0 second')  
else:  
  localInterval = timestampField[index] -  timestampField[index - 1]  
  if localInterval > maxInterval:  
        maxInterval = localInterval

输出:
2013-08-14 22:00:00 11.0
2013-08-14 22:40:00 300.0
2013-08-14 22:50:00 120.0