采用插值的pandas上的采样时间序列

时间:2017-10-13 10:49:13

标签: python pandas interpolation

我有一个

import pandas as pd
index = pd.date_range('1/1/2000', periods=9, freq='0.9S')
series = pd.Series(range(9), index=index)

>>> series
2000-01-01 00:00:00.000    0
2000-01-01 00:00:00.900    1
2000-01-01 00:00:01.800    2
2000-01-01 00:00:02.700    3
2000-01-01 00:00:03.600    4
2000-01-01 00:00:04.500    5
2000-01-01 00:00:05.400    6
2000-01-01 00:00:06.300    7
2000-01-01 00:00:07.200    8
Freq: 900L, dtype: int64

现在我

>>> series.resample(rule='0.5S').head(100)
2000-01-01 00:00:00.000    0.0
2000-01-01 00:00:00.500    1.0
2000-01-01 00:00:01.000    NaN
2000-01-01 00:00:01.500    2.0
2000-01-01 00:00:02.000    NaN
2000-01-01 00:00:02.500    3.0
2000-01-01 00:00:03.000    NaN
2000-01-01 00:00:03.500    4.0
2000-01-01 00:00:04.000    NaN
2000-01-01 00:00:04.500    5.0
2000-01-01 00:00:05.000    6.0
2000-01-01 00:00:05.500    NaN
2000-01-01 00:00:06.000    7.0
2000-01-01 00:00:06.500    NaN
2000-01-01 00:00:07.000    8.0
Freq: 500L, dtype: float64

正如我所料,但我得到了

>>> series.resample(rule='0.5S').interpolate(method='linear')
2000-01-01 00:00:00.000    0.000000
2000-01-01 00:00:00.500    0.555556
2000-01-01 00:00:01.000    1.111111
2000-01-01 00:00:01.500    1.666667
2000-01-01 00:00:02.000    2.222222
2000-01-01 00:00:02.500    2.777778
2000-01-01 00:00:03.000    3.333333
2000-01-01 00:00:03.500    3.888889
2000-01-01 00:00:04.000    4.444444
2000-01-01 00:00:04.500    5.000000
2000-01-01 00:00:05.000    5.000000
2000-01-01 00:00:05.500    5.000000
2000-01-01 00:00:06.000    5.000000
2000-01-01 00:00:06.500    5.000000
2000-01-01 00:00:07.000    5.000000
Freq: 500L, dtype: float64

我希望最后一个值仍然是8.0,对于6.5秒的时间戳仍为7.0。怎么了?

1 个答案:

答案 0 :(得分:3)

一种至少部分正确的方法(对于真实数据,结果不是很好,我用scipy's interp1d取得了更好的成功)是在方法之间使用mean()

>>> series.resample(rule='0.5S').mean().interpolate(method='linear')
2000-01-01 00:00:00.000    0.0
2000-01-01 00:00:00.500    1.0
2000-01-01 00:00:01.000    1.5
2000-01-01 00:00:01.500    2.0
2000-01-01 00:00:02.000    2.5
2000-01-01 00:00:02.500    3.0
2000-01-01 00:00:03.000    3.5
2000-01-01 00:00:03.500    4.0
2000-01-01 00:00:04.000    4.5
2000-01-01 00:00:04.500    5.0
2000-01-01 00:00:05.000    6.0
2000-01-01 00:00:05.500    6.5
2000-01-01 00:00:06.000    7.0
2000-01-01 00:00:06.500    7.5
2000-01-01 00:00:07.000    8.0
Freq: 500L, dtype: float64