这是我的一些数据集,它有Time,Temperature1,Temperature2
Timestamp. Temperature1. Temperature2
09/01/2016 00:00:08 53.4. 45.5
09/01/2016 00:00:38. 53.5. 45.2
09/01/2016 00:01:08. 54.6. 43.2
09/01/2016 00:01:38. 55.2. 46.3
09/01/2016 00:02:08. 54.5. 45.5
09/01/2016 00:04:08. 54.2. 35.5
09/01/2016 00:05:08. 52.4. 45.7
09/01/2016 00:05:38. 53.4. 45.2
我的数据每30秒就有一次......
这是我的数据集..有些时间戳丢失..bcoz。每30秒我的数据即将到来......因此缺少一些数据点。 如何找到数据点...并将数据插入NAN ... 请帮帮我..
答案 0 :(得分:3)
您可以使用resample('30S', base=8)方法:
In [20]: x.resample('30S', base=8).mean()
Out[20]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
上述解决方案假设Timestamp
属于datetime
dtype,并且已将其设置为索引。
如果Timestamp
是常规列(而不是索引),那么从Pandas 0.19.0开始,我们可以使用datetime
参数对常规列(必须为on='column_name'
dtype)进行重新采样: / p>
In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
如果您需要动态找到base
值 ,可以这样做:
In [21]: x.index[0].second
Out[21]: 8
来自docs:
base :int,默认为0
对于均匀细分1天的频率,聚合间隔的“原点”。例如,对于
5min
频率,基数范围可以从0
到4
。默认为
0
答案 1 :(得分:2)
假设时间戳已转换为datetime
,如果您将索引设置为timestamp列,然后将reindex
设置为日期范围,则会显示缺失值:
In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df
Out[94]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
In [96]:
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))
Out[96]:
Temperature1 Temperature2
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
这假设时间戳是常规的,这里我们使用时间戳第一个和最后一个值构建一个日期范围,频率为30秒:
In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')
Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
'2016-09-01 00:01:08', '2016-09-01 00:01:38',
'2016-09-01 00:02:08', '2016-09-01 00:02:38',
'2016-09-01 00:03:08', '2016-09-01 00:03:38',
'2016-09-01 00:04:08', '2016-09-01 00:04:38',
'2016-09-01 00:05:08', '2016-09-01 00:05:38'],
dtype='datetime64[ns]', freq='30S')
当您使用此reindex
时,任何缺少的索引标签都会变为NaN
值