我希望以以下格式转换数据帧为例:
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:07 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 4.0
2019-08-10 12:03:10 NaN
2019-08-10 12:03:11 NaN
2019-08-10 12:03:12 5.0
2019-08-10 12:03:13 NaN
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 NaN
2019-08-10 12:03:16 NaN
2019-08-10 12:03:17 6.0
分为以下一种:
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 1.667
2019-08-10 12:03:07 2.333
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 3.667
2019-08-10 12:03:10 4.333
2019-08-10 12:03:11 5.0
2019-08-10 12:03:12 3.667
2019-08-10 12:03:13 2.333
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 2.667
2019-08-10 12:03:16 4.333
2019-08-10 12:03:17 6.0
首先对齐数据框的位置类似于以下内容(取每个第三个值最接近的值):
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:07 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 NaN
2019-08-10 12:03:10 NaN
2019-08-10 12:03:11 5.0
2019-08-10 12:03:12 NaN
2019-08-10 12:03:13 NaN
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 NaN
2019-08-10 12:03:16 NaN
2019-08-10 12:03:17 6.0
然后在每个值之间线性内插以生成最终数据帧。如果间隔超过2秒,我只想在这2个值之间进行插值。
这是我到目前为止尝试过的:
df.resample('3s').nearest()
哪个会产生:
>>> df.resample('3s').nearest()
vals
2019-08-10 12:03:03 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:09 4.0
2019-08-10 12:03:12 5.0
2019-08-10 12:03:15 NaN
也:
>>> df.resample('2s').nearest()
vals
2019-08-10 12:03:04 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:10 NaN
2019-08-10 12:03:12 5.0
2019-08-10 12:03:14 1.0
2019-08-10 12:03:16 NaN
这很清楚,最接近是完整的谎言,或者至少是错误的用词,因为最接近10的值很明显是4。而且,2019-08-10 12:03:16
的最终值一定是{{1} }。
这只是试图将值与第二个对齐,此后,6.0
似乎就可以工作。
感谢您的帮助。
答案 0 :(得分:1)
我认为您需要base
参数,以Resampler.first
为索引的第一个值的3
(由于3秒)以模为模,来改变采样周期的偏移量:
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)
vals new
2019-08-10 12:03:05 1.0 1.0
2019-08-10 12:03:06 NaN NaN
2019-08-10 12:03:07 NaN NaN
2019-08-10 12:03:08 3.0 3.0
2019-08-10 12:03:09 4.0 NaN
2019-08-10 12:03:10 NaN NaN
2019-08-10 12:03:11 NaN 5.0
2019-08-10 12:03:12 5.0 NaN
2019-08-10 12:03:13 NaN NaN
2019-08-10 12:03:14 1.0 1.0
2019-08-10 12:03:15 NaN NaN
2019-08-10 12:03:16 NaN NaN
2019-08-10 12:03:17 6.0 6.0
然后迭代:
df['new'] = df['new'].interpolate()
print (df)
vals new
2019-08-10 12:03:05 1.0 1.000000
2019-08-10 12:03:06 NaN 1.666667
2019-08-10 12:03:07 NaN 2.333333
2019-08-10 12:03:08 3.0 3.000000
2019-08-10 12:03:09 4.0 3.666667
2019-08-10 12:03:10 NaN 4.333333
2019-08-10 12:03:11 NaN 5.000000
2019-08-10 12:03:12 5.0 3.666667
2019-08-10 12:03:13 NaN 2.333333
2019-08-10 12:03:14 1.0 1.000000
2019-08-10 12:03:15 NaN 2.666667
2019-08-10 12:03:16 NaN 4.333333
2019-08-10 12:03:17 6.0 6.000000
测试需要增加2秒的索引时间:
df.index += pd.Timedelta(2, 's')
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)
vals new
2019-08-10 12:03:07 1.0 1.0
2019-08-10 12:03:08 NaN NaN
2019-08-10 12:03:09 NaN NaN
2019-08-10 12:03:10 3.0 3.0
2019-08-10 12:03:11 4.0 NaN
2019-08-10 12:03:12 NaN NaN
2019-08-10 12:03:13 NaN 5.0
2019-08-10 12:03:14 5.0 NaN
2019-08-10 12:03:15 NaN NaN
2019-08-10 12:03:16 1.0 1.0
2019-08-10 12:03:17 NaN NaN
2019-08-10 12:03:18 NaN NaN
2019-08-10 12:03:19 6.0 6.0
答案 1 :(得分:1)
df1=df.set_index(['Time']).interpolate(method='linear').reset_index()
print(df1)
输出
Time vals
0 2019-08-10 12:03:05 1.000000
1 2019-08-10 12:03:06 1.666667
2 2019-08-10 12:03:07 2.333333
3 2019-08-10 12:03:08 3.000000
4 2019-08-10 12:03:09 4.000000
5 2019-08-10 12:03:10 4.333333
6 2019-08-10 12:03:11 4.666667
7 2019-08-10 12:03:12 5.000000
8 2019-08-10 12:03:13 3.000000
9 2019-08-10 12:03:14 1.000000
10 2019-08-10 12:03:15 2.666667
11 2019-08-10 12:03:16 4.333333
12 2019-08-10 12:03:17 6.000000
答案 2 :(得分:0)
如果要将nan值替换为最接近的值,则可以使用插值法
data['value'] = data['value'].interpolate(method='nearest')