似乎对于1Min条形数据,采样频率为8的任意倍数的resample()都有一个错误。下面的代码说明了在[3,5,6,8,16] Min进行重采样时的错误。对于3和5频率,重采样数据帧索引的第一个条目从基本时间戳(在这种情况下为9:30)开始,而对于频率8和16,重采样索引分别从9:26和9:18开始。 / p>
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
for freq in [3, 5, 6, 8, 16]:
print freq
print df.resample(str(freq) + 'Min', how='first', base=30).head(2)
产生以下输出:
3
A
2014-09-01 09:30:00 0
2014-09-01 09:33:00 3
5
A
2014-09-01 09:30:00 0
2014-09-01 09:35:00 5
6
A
2014-09-01 09:30:00 0
2014-09-01 09:36:00 6
8
A
2014-09-01 09:26:00 0
2014-09-01 09:34:00 4
16
A
2014-09-01 09:18:00 0
2014-09-01 09:34:00 4
答案 0 :(得分:0)
我认为resample基于00:00:00,所以我使用偏移索引到00:00然后重新采样。
方法1
Items.java
方法2:使用索引偏移等基础。
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
offsets = pd.offsets.Hour(9) + pd.offsets.Minute(30)
for freq in [1,3,5,6,8, 16]:
print(freq)
df.index = df.index - offsets
df = df.resample(str(freq) + 'T').agg({'A':'first'})
df.index = df.index + offsets
print(df.head(2))
然后输出
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
for freq in [1,3,5,6,8, 16]:
print(freq)
df = df.resample(str(freq) + 'T',base=9*60+30).agg({'A':'first'})
print(df.head(2))