我有非均匀的〜第二个数据,其时间序列索引如下:
import numpy as np
import pandas as pd
dates = [pd.datetime(2012, 2, 5, 17,00,35,327000), pd.datetime(2012, 2, 5, 17,00,37,325000),pd.datetime(2012, 2, 5, 17,00,37,776000),pd.datetime(2012, 2, 5, 17,00,38,233000),pd.datetime(2012, 2, 5, 17,00,40,946000),pd.datetime(2012, 2, 5, 17,00,41,327000),pd.datetime(2012, 2, 5, 17,00,42,06000),pd.datetime(2012, 2, 5, 17,00,44,99000),pd.datetime(2012, 2, 5, 17,00,44,99000),pd.datetime(2012, 2, 5, 17,00,46,289000),pd.datetime(2012, 2, 5, 17,00,49,96000),pd.datetime(2012, 2, 5, 17,00,53,240000)]
inhomogeneous_secondish_series = pd.Series(np.random.randn(len(dates)), name='some_col', index=pd.DatetimeIndex(dates))
In [26]: inhomogeneous_secondish_series
Out[26]:
2012-02-05 17:00:35.327000 -0.903398
2012-02-05 17:00:37.325000 0.535798
2012-02-05 17:00:37.776000 0.847231
2012-02-05 17:00:38.233000 -1.280244
2012-02-05 17:00:40.946000 1.330232
2012-02-05 17:00:41.327000 2.287555
2012-02-05 17:00:42.003072 -1.469432
2012-02-05 17:00:44.099000 -1.174953
2012-02-05 17:00:44.099000 -1.020135
2012-02-05 17:00:46.289000 -0.200043
2012-02-05 17:00:49.096000 -0.665699
2012-02-05 17:00:53.240000 0.748638
Name: some_col
我想重新采样以说'5s'。通常我会这样做:
In [28]: inhomogeneous_secondish_series.resample('5s')
这会产生很好的重采样5s数据,锚定到第0秒;在结果中,索引中的每个项目将是从给定分钟的第0秒开始的5秒的倍数:
2012-02-05 17:00:40 -0.200153
2012-02-05 17:00:45 -0.009347
2012-02-05 17:00:50 -0.432871
2012-02-05 17:00:55 0.748638
Freq: 5S
我如何将重采样数据固定在最新样本的时间周围,因此索引将如下所示:
...
2012-02-05 17:00:38.240000 (some correct resample value)
2012-02-05 17:00:43.240000 (some correct resample value)
2012-02-05 17:00:48.240000 (some correct resample value)
2012-02-05 17:00:53.240000 (some correct resample value)
Freq: 5S
我希望答案可能在于resample()的loffset参数,但是想知道是否有比在重新采样之前计算loffset更简单的方法。我是否必须查看最新的样本,找出它偏离最接近的正常5s频率并将其馈入loffset?
答案 0 :(得分:1)
loffset
只需更改标签,而无需更改数据分组到新频率的方式。所以使用你的例子:
max_date = max(dates)
offset = timedelta(seconds=(max_date.second % 5)-5
, microseconds=max_date.microsecond-1)
inhomogeneous_secondish_series.resample('5s', loffset=offset)
会给你:
2012-02-05 17:00:38.239999 -0.200153
2012-02-05 17:00:43.239999 -0.009347
2012-02-05 17:00:48.239999 -0.432871
2012-02-05 17:00:53.239999 0.748638
Freq: 5S
根据我的理解,这不是你想要的 - 最后一个值应该是数据集中最后两个值的平均值,而不仅仅是最后一个值。
要更改固定频率的方式,您可以使用base
。但是,因为这需要是一个整数,所以你应该使用适当的微秒频率,如:
freq_base = (max_date.second % 5)*1000000 + max_date.microsecond
inhomogeneous_secondish_series.resample('5000000U', base=freq_base)