我相信我的问题确实很简单,而且必须有一种非常简单的方法来解决此问题,但是由于我对Python相当陌生,尤其是熊猫,所以我无法自己解决它。
我组成了以下数据框,它代表了我正在研究的一种更简单的方案。我正在寻找一种方法,可以收集每10分钟每个索引之间的最大时间戳间隔。我正在设计一个过滤器,因此我想消除它以可视化方式显示每10分钟的最大时差。
Timestamp Category ... Class Speed
0 2013-08-14 22:00:00 1 ... 1 1
1 2013-08-14 22:00:01 1 ... 2 1
2 2013-08-14 22:00:05 1 ... 0 1.1
3 2013-08-14 22:00:07 1 ... 1 1.2
4 2013-08-14 22:00:14 1 ... 3 1.2
5 2013-08-14 22:00:15 1 ... 0 1.2
6 2013-08-14 22:00:16 1 ... 1 1.2
7 2013-08-14 22:00:27 1 ... 2 1.2
8 2013-08-14 22:00:38 1 ... 1 1.2
3000 2013-08-23 22:59:59 0 ... 1 2.3
我期望的结果类似于以下内容:
Timestamp Max time gap
2013-08-14 22:00:00 13.416600
2013-08-14 22:10:00 14.088200
2013-08-14 22:20:00 7.187153
2013-08-14 22:30:00 16.444224
2013-08-14 22:40:00 11.780500
2013-08-14 22:50:00 12.051639
希望我能做到简洁明了。非常感谢您在此方面的帮助!
答案 0 :(得分:1)
如果每10分钟数据需要最大差异:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = (df.resample('10Min', on='Timestamp')['Timestamp']
.apply(lambda x: x.diff().dt.total_seconds().max())
.reset_index(name='Max time gap'))
print (df)
Timestamp Max time gap
0 2013-08-14 22:00:00 11.0
1 2013-08-14 22:10:00 NaN
2 2013-08-14 22:20:00 NaN
3 2013-08-14 22:30:00 NaN
4 2013-08-14 22:40:00 NaN
... ...
1297 2013-08-23 22:10:00 NaN
1298 2013-08-23 22:20:00 NaN
1299 2013-08-23 22:30:00 NaN
1300 2013-08-23 22:40:00 NaN
1301 2013-08-23 22:50:00 NaN
[1302 rows x 2 columns]
测试:
df['new'] = df.resample('10Min', on='Timestamp')['Timestamp'].diff()
print (df)
Timestamp Category Class Speed new
0 2013-08-14 22:00:00 1 1 1.0 NaT
1 2013-08-14 22:00:01 1 2 1.0 00:00:01
2 2013-08-14 22:00:05 1 0 1.1 00:00:04
3 2013-08-14 22:00:07 1 1 1.2 00:00:02
4 2013-08-14 22:00:14 1 3 1.2 00:00:07
5 2013-08-14 22:00:15 1 0 1.2 00:00:01
6 2013-08-14 22:00:16 1 1 1.2 00:00:01
7 2013-08-14 22:00:27 1 2 1.2 00:00:11
8 2013-08-14 22:00:38 1 1 1.2 00:00:11
3000 2013-08-23 22:59:59 0 1 2.3 NaT
答案 1 :(得分:1)
您可以每10分钟resample
进行数据分析,并应用汇总函数来查找最大时差:
df.set_index(df.Timestamp.astype('datetime64'), inplace=True)
df['Timestamp'] = df['Timestamp'].astype('datetime64')
df['Timestamp'].resample('10m').agg(lambda x: np.max(x) - np.min(x))
答案 2 :(得分:1)
输入数据集:
数字,时间戳,类别,类,速度
0,2013-08-14 22:00:00,1,1,1
1,2013-08-14 22:00:01,1,2,1
2,2013-08-14 22:00:05,1,0,1.1
3,2013-08-14 22:00:07,1,1,1.2
4,2013-08-14 22:00:14,1,3,1.2
5,2013-08-14 22:00:15,1,0,1.2
6,2013-08-14 22:00:16,1,1,1.2
7,2013-08-14 22:00:27,1,2,1.2
8,2013-08-14 22:00:38,1,1,1.2
8,2013-08-14 22:40:38,1,1,1.2
8,2013-08-14 22:45:38,1,1,1.2
8,2013-08-14 22:49:38,1,1,1.2
8,2013-08-14 22:50:38,1,1,1.2
8,2013-08-14 22:52:38,1,1,1.2
3000,2013-08-23 22:59:59,0,1,1
流程:
import pandas as pd
dataset = pd.read_csv('dataset.csv')
dataset = pd.DataFrame(dataset)
timestampField = pd.to_datetime(dataset['Timestamp'])
startDate = pd.to_datetime('2013-08-14 22:00:00')
episode = pd.Timedelta('10 minutes')
maxInterval = pd.Timedelta('0 second')
for index in range(1, len(timestampField)):
if timestampField[index] >= startDate + episode:
print(startDate, maxInterval.total_seconds())
startDate = startDate + episode
while timestampField[index] > startDate + episode:
startDate = startDate + episode
maxInterval = pd.Timedelta('0 second')
else:
localInterval = timestampField[index] - timestampField[index - 1]
if localInterval > maxInterval:
maxInterval = localInterval
输出:
2013-08-14 22:00:00 11.0
2013-08-14 22:40:00 300.0
2013-08-14 22:50:00 120.0