我有一个时间序列,在事件发生时记录时间戳(因此没有给定的频率)。时间戳的精度是毫秒。 随着数千行和许多变量的继续,我想创建一个具有给定频率(此处为“5ms”)的新时间范围,并在那时插值。所以我尝试了这个:
import pandas as pd
a = pd.DataFrame({"Time":pd.to_datetime(['2016-01-23 00:00:00.001',
'2016-01-23 00:00:00.013','2016-01-23 00:00:00.018',
'2016-01-23 00:00:00.024']),
"Value": [1,2,3,4]})
a = a.set_index(a["Time"])
b = pd.date_range(start='2016-01-23 00:00:00.00',
end='2016-01-23 00:00:00.025', freq='5ms')
c = a.reindex(b).interpolate(method="time")
>> Time Value
2016-01-23 00:00:00.000 NaT NaN
2016-01-23 00:00:00.005 NaT NaN
2016-01-23 00:00:00.010 NaT NaN
2016-01-23 00:00:00.015 NaT NaN
2016-01-23 00:00:00.020 NaT NaN
2016-01-23 00:00:00.025 NaT NaN
d=a.resample('5ms').interpolate()
>> Time Value
2016-01-23 00:00:00.000 NaT NaN
2016-01-23 00:00:00.005 NaT NaN
2016-01-23 00:00:00.010 NaT NaN
2016-01-23 00:00:00.015 NaT NaN
2016-01-23 00:00:00.020 NaT NaN
我想如果新的时间尺度不包含之前的时间戳,这些解决方案都不会起作用吗?我最终解决了这个问题如下:
e = a.reindex(a.index.union(b)).interpolate(method='time').reindex(b)
>> Time Value
2016-01-23 00:00:00.000 NaT NaN
2016-01-23 00:00:00.005 NaT 1.333333
2016-01-23 00:00:00.010 NaT 1.749995
2016-01-23 00:00:00.015 NaT 2.400031
2016-01-23 00:00:00.020 NaT 3.333348
2016-01-23 00:00:00.025 NaT 4.000000
但这对我来说看起来很沉重而且效率不高。我本以为可以直接使用插值函数。有什么想法吗?
答案 0 :(得分:1)
如果您在索引上设置时间,则可以使用resample。
if
输出:
a = pd.DataFrame({"Time":pd.to_datetime(['2016-01-23 00:00:00.001',
'2016-01-23 00:00:00.013','2016-01-23 00:00:00.018',
'2016-01-23 00:00:00.024']),
"Value": [1,2,3,4]})
a.set_index('Time', inplace=True)
print(a.resample('1ms').interpolate().resample('5ms').first())
还有一点解决方法。但这是件事!
直接重新采样到5微秒会产生更粗略的插值:
Value
Time
2016-01-23 00:00:00.000 1.000000
2016-01-23 00:00:00.005 1.333333
2016-01-23 00:00:00.010 1.750000
2016-01-23 00:00:00.015 2.400000
2016-01-23 00:00:00.020 3.333333