我对精确的时间插值有一点问题。
一个单独的文件给了我一个像这样的时间数组。为了清楚起见,在此示例中,两个时间列均使用相同的时间范围:
import pandas as pd
import numpy as np
df_time = pd.DataFrame({
'TIMETAG': ['13:52:41.562', '13:52:41.640', '13:52:41.749', '13:52:41.838',\
'13:52:41.948', '13:52:42.048', '13:52:42.138',\
'13:52:42.258', '13:52:42.398', '13:52:42.584', '13:52:42.584',\
'13:52:42.692', '13:52:42.879', '13:52:42.957',\
'13:52:43.066', '13:52:43.176', '13:52:43.269', '13:52:43.363',\
'13:52:43.472', '13:52:43.597', '13:52:43.722',\
'13:52:43.815', '13:52:43.987', '13:52:44.065', '13:52:44.190',\
'13:52:44.299', '13:52:44.392', '13:52:44.486',\
'13:52:44.595', '13:52:44.673', '13:52:44.798', '13:52:44.970',\
'13:52:45.001', '13:52:45.094', '13:52:45.235']})
我使用以下命令将其转换为毫秒:
timerange = pd.to_datetime(df_time['TIMETAG'])
timeit = timerange.astype('int64')//(10**6)
该时间戳最初有500多行,采样率接近100毫秒,但我仍然对其进行了调整,因此每个时间步正好是100毫秒
timerange = np.arange(np.amin(timeit), np.amax(timeit), 100)
我需要插入一个单独的数据帧,该数据帧已经有一个时间列,但是只有58个元素(因此,采样率大约低9倍)。此数据帧必须插值到timeit数组。
df = pd.DataFrame({
'TIMETAG2' : ['13:52:41.562', '13:52:42.238', '13:52:42.558' ,\
'13:52:42.879', '13:52:43.176' ,\
'13:52:43.597', '13:52:44.299', '13:52:44.595' ,\
'13:52:44.970', '13:52:45.235'],
'350.0' : [13.108239, 12.398412, 13.020835, 14.030805, 13.852628 ,\
13.901151, 13.050930, 12.642002, 11.864150, 11.297425 ],
'400.0' : [22.551765, 22.186752, 22.603124, 24.662806, 24.108199 ,\
24.057507, 23.258363, 22.721349, 21.300732, 20.733452 ],
'450.0' : [32.221240, 32.621537, 32.367137, 35.565543, 34.632190 ,\
34.444403, 34.098969, 33.486451, 31.556474, 31.584678 ],
'500.0' : [33.460819, 34.410052, 33.755817, 36.839105, 35.827079 ,\
35.691536, 35.732444, 35.349296, 33.618491, 34.132295 ],
'550.0' : [ 32.423253, 33.517339, 32.708333, 35.677932, 34.682384 ,\
34.515653, 34.753437, 34.456637, 32.790737, 33.458967 ],
'600.0' : [ 28.563580, 29.187609, 28.715661, 31.343185, 30.541189 ,\
30.366380, 30.278298, 29.895978, 28.392532, 28.646102]
})
同样,我将时间列转换为毫秒:
df_timetag = pd.to_datetime(df['TIMETAG2'])
df_timeit = df_timetag.astype('int64')//(10**6)
将时间戳记设置为索引,我尝试使用以下命令进行插值:
df['TIMETAG2'] = df_timeit
df1 = df.set_index('TIMETAG2')
df2 = df1.reindex(timerange)
df2除第一行外都充满了NaN。
df3 = df2.interpolate(axis=0, limit_direction='both')
插值后,每一列中的所有值实际上都相等。
当我使用长度为530对60的时间数组(来自数据帧)时,也会发生同样的事情。这是一个简单的例子。
我的问题是:当时间范围相同,但一个数组比另一个数组具有更多的元素时,如何成功地按时间插值?
答案 0 :(得分:0)
问题出在重新索引行中-如果需要,请检查更正内容。现在可以了
df2 = df1.reindex(index=df1.index.union(timerange))
df3 = df2.interpolate(axis=0, limit_direction='both')
如果只需要带有时间范围数组中值的索引:
df4 = df3.loc[timerange]