我有一个像
这样的DataFrameIn [29]: data = pd.DataFrame(np.random.random((72000,3)), columns=list('uvw'),
index=pd.date_range('2013-11-08 10:00:00', periods=72000, freq='50L'))
In [30]: print(data[['u', 'v', 'w']])
u v w
Timestamp
2013-11-08 10:00:00.000 2.375 -5.206 -0.103
2013-11-08 10:00:00.050 2.493 -5.098 -0.018
2013-11-08 10:00:00.100 2.263 -5.114 0.014
2013-11-08 10:00:00.150 2.210 -5.235 -0.012
2013-11-08 10:00:00.200 2.158 -5.174 -0.112
2013-11-08 10:00:00.250 2.334 -5.279 -0.092
... ... ... ...
2013-11-08 10:59:59.700 5.065 -4.453 0.424
2013-11-08 10:59:59.750 5.262 -4.703 0.126
2013-11-08 10:59:59.800 5.323 -4.882 0.242
2013-11-08 10:59:59.850 5.344 -5.119 0.457
2013-11-08 10:59:59.900 5.281 -5.261 0.599
2013-11-08 10:59:59.950 5.235 -4.801 0.362
[72000 rows x 3 columns]
但是会发生这种情况:
In [33]: print(data.loc['2013-11-08 10:15:00.000':'2013-11-08 10:17:00.000', ['u','v','w']])
u v w
Timestamp
2013-11-08 10:15:00.000 2.634 -4.351 0.107
2013-11-08 10:15:00.050 2.869 -4.249 0.040
2013-11-08 10:15:00.100 3.320 -4.326 -0.079
2013-11-08 10:15:00.150 2.759 -4.339 -0.007
2013-11-08 10:15:00.200 2.748 -4.128 -0.038
2013-11-08 10:15:00.250 3.149 -4.074 -0.387
... ... ... ...
2013-11-08 10:17:00.700 3.910 -4.698 -0.366
2013-11-08 10:17:00.750 3.824 -4.535 -0.313
2013-11-08 10:17:00.800 3.758 -4.353 -0.116
2013-11-08 10:17:00.850 3.761 -4.454 -0.010
2013-11-08 10:17:00.900 3.546 -4.766 -0.433
2013-11-08 10:17:00.950 3.238 -4.601 -0.378
[2420 rows x 3 columns]
也就是说,当我预计最后一个输出是2013-11-08 10:17:00.000
时,它是2013-11-08 10:17:00.950
,好像我的命令是data.loc['2013-11-08 10:15:00.000':'2013-11-08 10:17:00']
。这是预期的吗?
一些有用的输出:
In [32]: print(type(data.index))
<class 'pandas.tseries.index.DatetimeIndex'>
In [33]: print(data.index.inferred_freq)
50L
修改
我发现当我使用datetime
对象指定时间戳而不是字符串时,它可以正常工作:
In [15]: data.loc['2013-11-08 10:15:00.000':datetime(2013,11,8,10,17,0,0), ['u','v','w']]
Out[15]:
u v w
2013-11-08 10:15:00.000 0.982873 0.795108 0.417056
2013-11-08 10:15:00.050 0.224579 0.715234 0.284113
2013-11-08 10:15:00.100 0.991813 0.031380 0.934422
2013-11-08 10:15:00.150 0.535270 0.717672 0.207417
2013-11-08 10:15:00.200 0.272606 0.837425 0.715765
2013-11-08 10:15:00.250 0.254134 0.541588 0.956947
... ... ... ...
2013-11-08 10:16:59.750 0.165730 0.362087 0.879207
2013-11-08 10:16:59.800 0.532108 0.961432 0.692155
2013-11-08 10:16:59.850 0.722646 0.432374 0.994856
2013-11-08 10:16:59.900 0.091556 0.044398 0.769436
2013-11-08 10:16:59.950 0.195347 0.688370 0.373486
2013-11-08 10:17:00.000 0.068244 0.667574 0.301586
[2401 rows x 3 columns]
所以我认为这是一个错误,或者我正在编写字符串来指定错误的日期。