我有1m频率的OHLC数据:
Open High Low Close
DateTime
2005-09-06 18:00:00 1230.25 1231.50 1230.25 1230.25
2005-09-06 18:01:00 1230.50 1231.75 1229.25 1230.50
.
.
2005-09-07 15:59:00 1234.50 1235.50 1234.25 1234.50
2005-09-07 16:00:00 1234.25 1234.50 1234.25 1234.25
我需要做一个" custom"适合期货小时数据的重新抽样,其中:
完成重采样后,输出应为:
Open High Low Close
DateTime
2005-09-07 16:00:00 1230.25 1235.50 1229.25 1234.25
其中:
我试过了:
我使用了以下'如何':
conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
答案 0 :(得分:3)
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
# Make sure the index is a DatetimeIndex
df.index = pd.DatetimeIndex(df.index)
# discard rows whose time falls between 16:00 and 18:00
df = df.between_time('18:00', '16:00', include_start=True, include_end=True)
proxy = df.index + pd.DateOffset(hours=6)
result = df.groupby(proxy.date).agg(
{'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'})
result = result.reindex(columns=['Open','High','Low','Close'])
print(result)
产量
Open High Low Close
2005-09-07 1230.25 1235.5 1229.25 1234.25
上面的代码创建了一个代理日期,该日期是通过向索引中的每个日期时间添加6个小时来计算的。然后,此代理日期将用作groupby
值。
In [112]: proxy = pd.DatetimeIndex(df.index) + pd.DateOffset(hours=6)
要查看代理值如何与索引相对应:
In [116]: pd.Series(proxy.date, index=df.index)
Out[116]:
DateTime
2005-09-06 18:00:00 2005-09-07
2005-09-06 18:01:00 2005-09-07
2005-09-07 15:59:00 2005-09-07
2005-09-07 16:00:00 2005-09-07
dtype: object