大熊猫盘中8Min重采样bug?

时间:2014-10-08 20:17:56

标签: python pandas

似乎对于1Min条形数据,采样频率为8的任意倍数的resample()都有一个错误。下面的代码说明了在[3,5,6,8,16] Min进行重采样时的错误。对于3和5频率,重采样数据帧索引的第一个条目从基本时间戳(在这种情况下为9:30)开始,而对于频率8和16,重采样索引分别从9:26和9:18开始。 / p>

import pandas as pd
import datetime as dt
import numpy as np

datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)

tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])

for freq in [3, 5, 6, 8, 16]:
    print freq
    print df.resample(str(freq) + 'Min', how='first', base=30).head(2)

产生以下输出:

3
                     A
2014-09-01 09:30:00  0
2014-09-01 09:33:00  3
5
                     A
2014-09-01 09:30:00  0
2014-09-01 09:35:00  5
6
                     A
2014-09-01 09:30:00  0
2014-09-01 09:36:00  6
8
                     A
2014-09-01 09:26:00  0
2014-09-01 09:34:00  4
16
                     A
2014-09-01 09:18:00  0
2014-09-01 09:34:00  4

1 个答案:

答案 0 :(得分:0)

我认为resample基于00:00:00,所以我使用偏移索引到00:00然后重新采样。

方法1

Items.java

方法2:使用索引偏移等基础。

import pandas as pd
import datetime as dt
import numpy as np

datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)

tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])

offsets = pd.offsets.Hour(9) + pd.offsets.Minute(30)
for freq in [1,3,5,6,8, 16]:
    print(freq)
    df.index = df.index - offsets
    df = df.resample(str(freq) + 'T').agg({'A':'first'})
    df.index = df.index + offsets
    print(df.head(2))

然后输出

import pandas as pd
import datetime as dt
import numpy as np

datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)

tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])

for freq in [1,3,5,6,8, 16]:
    print(freq)
    df = df.resample(str(freq) + 'T',base=9*60+30).agg({'A':'first'})
    print(df.head(2))