数据:
ohlc_dict = {
'Open':'first',
'High':'max',
'Low':'min',
'Last': 'last',
'Volume': 'sum'}
data['hod'] = [r.hour for r in data.index]
data.head(10)
Out[61]:
Open High Low Last Volume hod dow
Timestamp
2014-05-08 08:00:00 136.230 136.290 136.190 136.290 7077 8 Thursday
2014-05-08 08:15:00 136.290 136.300 136.240 136.250 3881 8 Thursday
2014-05-08 08:30:00 136.240 136.270 136.230 136.230 2540 8 Thursday
2014-05-08 08:45:00 136.230 136.260 136.230 136.250 2293 8 Thursday
2014-05-08 09:00:00 136.250 136.360 136.240 136.360 15014 9 Thursday
2014-05-08 09:15:00 136.350 136.360 136.260 136.270 11697 9 Thursday
2014-05-08 09:30:00 136.270 136.270 136.190 136.200 15600 9 Thursday
2014-05-08 09:45:00 136.200 136.270 136.200 136.240 9025 9 Thursday
2014-05-08 10:00:00 136.240 136.270 136.240 136.260 7128 10 Thursday
2014-05-08 10:15:00 136.250 136.260 136.200 136.200 6100 10 Thursday
问题:
以下两项都将时间范围从15分钟更改为1小时间隔:
方法1:
data['2016'].groupby('hod').Volume.mean().head()
hod
8 8452.597
9 16485.398
10 15619.626
11 14132.666
12 11470.058
Name: Volume, dtype: float64
方法2:
df_h1 = data.resample('1h').agg(ohlc_dict).dropna()
df_h1['hod'] = [r.hour for r in df_h1.index]
df_h1['2016'].groupby('hod')['Volume'].mean()
Timestamp
2014-05-08 08:00:00 15791.000
2014-05-08 09:00:00 51336.000
2014-05-08 10:00:00 28855.000
2014-05-08 11:00:00 56543.000
2014-05-08 12:00:00 25249.000
Name: Volume, dtype: float64
只有方法2才能显示出体积数据的准确输出。
如何更改方法1,为方法2提供相同的Volume
输出,但使用groupby
代替resample
?我不确定如何在方法1中使用ohlc_dict
,并认为这是必需的。