Pandas时间戳切片中出现意外行为

时间:2016-08-04 17:41:28

标签: python pandas slice

我有一个像

这样的DataFrame
In [29]: data = pd.DataFrame(np.random.random((72000,3)), columns=list('uvw'), 
           index=pd.date_range('2013-11-08 10:00:00', periods=72000, freq='50L'))

In [30]: print(data[['u', 'v', 'w']])
                             u      v      w
Timestamp                                   
2013-11-08 10:00:00.000  2.375 -5.206 -0.103
2013-11-08 10:00:00.050  2.493 -5.098 -0.018
2013-11-08 10:00:00.100  2.263 -5.114  0.014
2013-11-08 10:00:00.150  2.210 -5.235 -0.012
2013-11-08 10:00:00.200  2.158 -5.174 -0.112
2013-11-08 10:00:00.250  2.334 -5.279 -0.092
...                        ...    ...    ...
2013-11-08 10:59:59.700  5.065 -4.453  0.424
2013-11-08 10:59:59.750  5.262 -4.703  0.126
2013-11-08 10:59:59.800  5.323 -4.882  0.242
2013-11-08 10:59:59.850  5.344 -5.119  0.457
2013-11-08 10:59:59.900  5.281 -5.261  0.599
2013-11-08 10:59:59.950  5.235 -4.801  0.362

[72000 rows x 3 columns]

但是会发生这种情况:

In [33]: print(data.loc['2013-11-08 10:15:00.000':'2013-11-08 10:17:00.000', ['u','v','w']])
                             u      v      w
Timestamp                                   
2013-11-08 10:15:00.000  2.634 -4.351  0.107
2013-11-08 10:15:00.050  2.869 -4.249  0.040
2013-11-08 10:15:00.100  3.320 -4.326 -0.079
2013-11-08 10:15:00.150  2.759 -4.339 -0.007
2013-11-08 10:15:00.200  2.748 -4.128 -0.038
2013-11-08 10:15:00.250  3.149 -4.074 -0.387
...                        ...    ...    ...
2013-11-08 10:17:00.700  3.910 -4.698 -0.366
2013-11-08 10:17:00.750  3.824 -4.535 -0.313
2013-11-08 10:17:00.800  3.758 -4.353 -0.116
2013-11-08 10:17:00.850  3.761 -4.454 -0.010
2013-11-08 10:17:00.900  3.546 -4.766 -0.433
2013-11-08 10:17:00.950  3.238 -4.601 -0.378

[2420 rows x 3 columns]

也就是说,当我预计最后一个输出是2013-11-08 10:17:00.000时,它是2013-11-08 10:17:00.950,好像我的命令是data.loc['2013-11-08 10:15:00.000':'2013-11-08 10:17:00']。这是预期的吗?

一些有用的输出:

In [32]: print(type(data.index))
<class 'pandas.tseries.index.DatetimeIndex'>

In [33]: print(data.index.inferred_freq)
50L

修改

我发现当我使用datetime对象指定时间戳而不是字符串时,它可以正常工作:

In [15]: data.loc['2013-11-08 10:15:00.000':datetime(2013,11,8,10,17,0,0), ['u','v','w']]
Out[15]: 
                                u         v         w
2013-11-08 10:15:00.000  0.982873  0.795108  0.417056
2013-11-08 10:15:00.050  0.224579  0.715234  0.284113
2013-11-08 10:15:00.100  0.991813  0.031380  0.934422
2013-11-08 10:15:00.150  0.535270  0.717672  0.207417
2013-11-08 10:15:00.200  0.272606  0.837425  0.715765
2013-11-08 10:15:00.250  0.254134  0.541588  0.956947
...                           ...       ...       ...
2013-11-08 10:16:59.750  0.165730  0.362087  0.879207
2013-11-08 10:16:59.800  0.532108  0.961432  0.692155
2013-11-08 10:16:59.850  0.722646  0.432374  0.994856
2013-11-08 10:16:59.900  0.091556  0.044398  0.769436
2013-11-08 10:16:59.950  0.195347  0.688370  0.373486
2013-11-08 10:17:00.000  0.068244  0.667574  0.301586

[2401 rows x 3 columns]

所以我认为这是一个错误,或者我正在编写字符串来指定错误的日期。

0 个答案:

没有答案