我从HDF5文件中提取pandas DataFrames并对它们进行分析。出于某种原因,我的一个DataFrames的索引在应用过滤器后正在进行看似随机的类型转换:
(Pdb) ORD_ticks
Codes Price Size
Time
2015-02-12 11:35:28-05:00 OC 148.200 0
2015-02-12 14:51:25-05:00 OC 148.870 0
2015-02-12 14:55:21-05:00 OC 146.550 0
2015-02-12 14:55:57-05:00 OC 148.230 0
2015-02-12 14:58:27-05:00 OC 148.542 0
2015-02-12 15:01:28-05:00 OC 148.200 0
2015-02-12 15:07:32-05:00 OC 148.400 0
... ... ... ...
2015-05-19 11:35:14-04:00 OC 152.000 0
2015-05-19 14:51:26-04:00 OC 151.980 0
2015-05-19 14:55:21-04:00 OC 151.500 0
2015-05-19 14:55:56-04:00 OC 151.800 0
2015-05-19 14:58:32-04:00 OC 151.966 0
2015-05-19 15:01:32-04:00 OC 152.110 0
2015-05-19 15:07:39-04:00 OC 152.000 0
[462 rows x 3 columns]
(Pdb) type(ORD_ticks.index)
<class 'pandas.tseries.index.DatetimeIndex'>
然后,我将以下过滤器应用于ORD_ticks
以获取ORD_prices
:
ORD_prices = ORD_ticks.ix[indicator.index.map(lambda t: ORD_ticks.index.asof(t)).tolist()].groupby(level=0).last()
在此之后,ORD_prices
看起来像这样:
(Pdb) ORD_prices
Codes Price Size
1.423772e+18 OC 148.40 0
1.423858e+18 OC 148.29 0
1.424204e+18 OC 146.15 0
1.424290e+18 OC 146.51 0
1.424376e+18 OC 146.22 0
1.424463e+18 OC 145.08 0
1.424722e+18 OC 147.72 0
... ... ... ...
1.431371e+18 OC 149.95 0
1.431458e+18 OC 145.58 0
1.431544e+18 OC 145.22 0
1.431630e+18 OC 148.01 0
1.431717e+18 OC 148.91 0
1.431976e+18 OC 148.89 0
1.432062e+18 OC 152.00 0
[63 rows x 3 columns]
(Pdb) type(ORD_prices.index)
<class 'pandas.core.index.Float64Index'>
奇怪的是,我正在对大约100个不同的数据集进行完全相同的操作,而这只发生在这一个!发生了什么事?
这是indicator
:
(Pdb) indicator
Empty DataFrame
Columns: []
Index: [2015-02-09 15:30:00-05:00, 2015-02-10 15:30:00-05:00, 2015-02-11 15:30:0
0-05:00, 2015-02-12 15:30:00-05:00, 2015-02-13 15:30:00-05:00, 2015-02-17 15:30:
00-05:00, 2015-02-18 15:30:00-05:00, 2015-02-19 15:30:00-05:00, 2015-02-20 15:30
:00-05:00, 2015-02-23 15:30:00-05:00, 2015-02-24 15:30:00-05:00, 2015-02-25 15:3
0:00-05:00, 2015-02-26 15:30:00-05:00, 2015-02-27 15:30:00-05:00, 2015-03-02 15:
30:00-05:00, 2015-03-03 15:30:00-05:00, 2015-03-04 15:30:00-05:00, 2015-03-05 15
:30:00-05:00, 2015-03-06 15:30:00-05:00, 2015-03-09 15:30:00-04:00, 2015-03-10 1
5:30:00-04:00, 2015-03-11 15:30:00-04:00, 2015-03-12 15:30:00-04:00, 2015-03-13
15:30:00-04:00, 2015-03-16 15:30:00-04:00, 2015-03-17 15:30:00-04:00, 2015-03-18
15:30:00-04:00, 2015-03-19 15:30:00-04:00, 2015-03-20 15:30:00-04:00, 2015-03-2
3 15:30:00-04:00, 2015-03-24 15:30:00-04:00, 2015-03-25 15:30:00-04:00, 2015-03-
26 15:30:00-04:00, 2015-03-27 15:30:00-04:00, 2015-03-30 15:30:00-04:00, 2015-03
-31 15:30:00-04:00, 2015-04-01 15:30:00-04:00, 2015-04-07 15:30:00-04:00, 2015-0
4-08 15:30:00-04:00, 2015-04-09 15:30:00-04:00, 2015-04-10 15:30:00-04:00, 2015-
04-13 15:30:00-04:00, 2015-04-14 15:30:00-04:00, 2015-04-15 15:30:00-04:00, 2015
-04-16 15:30:00-04:00, 2015-04-17 15:30:00-04:00, 2015-04-20 15:30:00-04:00, 201
5-04-21 15:30:00-04:00, 2015-04-22 15:30:00-04:00, 2015-04-23 15:30:00-04:00, 20
15-04-24 15:30:00-04:00, 2015-04-27 15:30:00-04:00, 2015-04-28 15:30:00-04:00, 2
015-04-29 15:30:00-04:00, 2015-05-04 15:30:00-04:00, 2015-05-05 15:30:00-04:00,
2015-05-06 15:30:00-04:00, 2015-05-07 15:30:00-04:00, 2015-05-08 15:30:00-04:00,
2015-05-11 15:30:00-04:00, 2015-05-12 15:30:00-04:00, 2015-05-13 15:30:00-04:00
, 2015-05-14 15:30:00-04:00, 2015-05-15 15:30:00-04:00, 2015-05-18 15:30:00-04:0
0, 2015-05-19 15:30:00-04:00]
答案 0 :(得分:2)
使用.reindex(method='nearest')
与您正在进行的操作相同(但更快)。这需要0.16.0。
In [41]: df = DataFrame({'A' : range(10) },index=pd.date_range('20130101',freq='2S',periods=10,tz='US/Eastern'))
In [42]: df
Out[42]:
A
2013-01-01 00:00:00-05:00 0
2013-01-01 00:00:02-05:00 1
2013-01-01 00:00:04-05:00 2
2013-01-01 00:00:06-05:00 3
2013-01-01 00:00:08-05:00 4
2013-01-01 00:00:10-05:00 5
2013-01-01 00:00:12-05:00 6
2013-01-01 00:00:14-05:00 7
2013-01-01 00:00:16-05:00 8
2013-01-01 00:00:18-05:00 9
In [43]: idx = pd.date_range('20130101 00:00:00',periods=20,freq='5s',tz='US/Eastern')
In [44]: df.reindex(idx,method='nearest')
Out[44]:
A
2013-01-01 00:00:00-05:00 0
2013-01-01 00:00:05-05:00 3
2013-01-01 00:00:10-05:00 5
2013-01-01 00:00:15-05:00 8
2013-01-01 00:00:20-05:00 9
2013-01-01 00:00:25-05:00 9
2013-01-01 00:00:30-05:00 9
2013-01-01 00:00:35-05:00 9
2013-01-01 00:00:40-05:00 9
2013-01-01 00:00:45-05:00 9
2013-01-01 00:00:50-05:00 9
2013-01-01 00:00:55-05:00 9
2013-01-01 00:01:00-05:00 9
2013-01-01 00:01:05-05:00 9
2013-01-01 00:01:10-05:00 9
2013-01-01 00:01:15-05:00 9
2013-01-01 00:01:20-05:00 9
2013-01-01 00:01:25-05:00 9
2013-01-01 00:01:30-05:00 9
2013-01-01 00:01:35-05:00 9