鉴于美国市场营业时间:
In [220]: market_hours = pandas.date_range(date + ' 09:30:00', date + ' 16:00:00', freq='15min', tz='US/Eastern').tz_convert('UTC')
In [221]: market_hours
Out[221]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-04-29 13:30:00+00:00, ..., 2014-04-29 20:00:00+00:00]
Length: 27, Freq: 15T, Timezone: UTC
我可以resample()
单个字段并限制这些营业时间:
In [222]: df.set_index('localtime')['size'].resample('15min', how='sum')[market_hours]
Out[222]:
2014-04-29 13:30:00+00:00 1093142
2014-04-29 13:45:00+00:00 556664
2014-04-29 14:00:00+00:00 467662
2014-04-29 14:15:00+00:00 460966
2014-04-29 14:30:00+00:00 275805
2014-04-29 14:45:00+00:00 192709
2014-04-29 15:00:00+00:00 226375
2014-04-29 15:15:00+00:00 175065
2014-04-29 15:30:00+00:00 181047
2014-04-29 15:45:00+00:00 129644
2014-04-29 16:00:00+00:00 193330
2014-04-29 16:15:00+00:00 170046
2014-04-29 16:30:00+00:00 130674
2014-04-29 16:45:00+00:00 107118
2014-04-29 17:00:00+00:00 156699
2014-04-29 17:15:00+00:00 153912
2014-04-29 17:30:00+00:00 180449
2014-04-29 17:45:00+00:00 223318
2014-04-29 18:00:00+00:00 211324
2014-04-29 18:15:00+00:00 152374
2014-04-29 18:30:00+00:00 121876
2014-04-29 18:45:00+00:00 90891
2014-04-29 19:00:00+00:00 138222
2014-04-29 19:15:00+00:00 167571
2014-04-29 19:30:00+00:00 264658
2014-04-29 19:45:00+00:00 492528
2014-04-29 20:00:00+00:00 8354
Freq: 15T, Name: size, dtype: int64
但是,如果我尝试resample()
一个字段数组,我会收到一个错误:
In [223]: df.set_index('localtime')[['size']].resample('15min', how='sum')[market_hours]
...
KeyError: "['2014-04-29T09:30:00.000000000-0400' '2014-04-29T09:45:00.000000000-0400'\n '2014-04-29T10:00:00.000000000-0400' '2014-04-29T10:15:00.000000000-0400'\n '2014-04-29T10:30:00.000000000-0400' '2014-04-29T10:45:00.000000000-0400'\n '2014-04-29T11:00:00.000000000-0400' '2014-04-29T11:15:00.000000000-0400'\n '2014-04-29T11:30:00.000000000-0400' '2014-04-29T11:45:00.000000000-0400'\n '2014-04-29T12:00:00.000000000-0400' '2014-04-29T12:15:00.000000000-0400'\n '2014-04-29T12:30:00.000000000-0400' '2014-04-29T12:45:00.000000000-0400'\n '2014-04-29T13:00:00.000000000-0400' '2014-04-29T13:15:00.000000000-0400'\n '2014-04-29T13:30:00.000000000-0400' '2014-04-29T13:45:00.000000000-0400'\n '2014-04-29T14:00:00.000000000-0400' '2014-04-29T14:15:00.000000000-0400'\n '2014-04-29T14:30:00.000000000-0400' '2014-04-29T14:45:00.000000000-0400'\n '2014-04-29T15:00:00.000000000-0400' '2014-04-29T15:15:00.000000000-0400'\n '2014-04-29T15:30:00.000000000-0400' '2014-04-29T15:45:00.000000000-0400'\n '2014-04-29T16:00:00.000000000-0400'] not in index"
有没有办法按日期范围访问生成的DataFrame?这似乎与时区无关。
答案 0 :(得分:1)
在第一种情况下,您正在为系列编制索引。在第二种情况下(使用df[['size']].resample(..
,请注意双方括号),您正在使用DataFrame
DataFrame(df[labels]
)上的基本索引将索引列,而不是行(请参阅http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics)。因此,您会收到标签不在(列)索引中的错误。
要解决此问题,您可以使用loc
(假设result
是重新采样的结果):
result.loc[market_hours, :]