问题?从pandas 0.17.1中的DataFrame中选择data_time之间的数据

时间:2015-12-10 17:16:26

标签: python pandas

我在使用between_time从pandas DataFrame中选择数据时遇到问题。当查询的开始日期和结束日期在两天之间时,结果为空。我正在使用pandas 0.17.1(python 2.7)

我有以下数据框:

mydf = pd.DataFrame.from_dict({'azi': {Timestamp('2015-05-12 00:00:14.348000'): 109.801,
Timestamp('2015-05-12 00:00:36.125000'): 109.994,
Timestamp('2015-05-12 00:00:57.599000'): 109.60299999999999,
Timestamp('2015-05-12 00:01:14.576000'): 100.2},
'ele': {Timestamp('2015-05-12 00:00:14.348000'): 180.001,
Timestamp('2015-05-12 00:00:36.125000'): 179.999,
Timestamp('2015-05-12 00:00:57.599000'): 179.999,
Timestamp('2015-05-12 00:01:14.576000'): 180.001}})

结果是:

                            azi     ele
2015-05-12 00:00:14.348     109.801     180.001
2015-05-12 00:00:36.125     109.994     179.999
2015-05-12 00:00:57.599     109.603     179.999
2015-05-12 00:01:14.576     100.200     180.001

以下查询失败

mydf['azi'].between_time(datetime(2015, 5, 11, 23, 59, 59, 850000), datetime(2015, 5, 12, 0, 1, 59, 850000))

导致:

Series([], Name: azi, dtype: float64)

但是以下查询有效

mydf2['azi'].between_time(datetime(2015, 5, 11, 0, 0, 0, 0), datetime(2015, 5, 12, 0, 1, 59, 850000))

正确答案:

 2015-05-12 00:00:14.348    109.801
 2015-05-12 00:00:36.125    109.994
 2015-05-12 00:00:57.599    109.603
 2015-05-12 00:01:14.576    100.200
 Name: azi, dtype: float64

问题

  1. 我缺少功能的功能,或者这是一个真正的错误?
  2. 有解决方法吗?背景是我真的需要在1分钟的块中处理数据,其限制并不总是与00:00:00重合

2 个答案:

答案 0 :(得分:0)

您可以从docs找到有关如何使用日期时间索引的大量信息。对于您的情况,您可以尝试loc

In [147]: mydf['azi'].loc[datetime(2015, 5, 11, 23, 59, 59, 850000): datetime(2015, 5, 12, 0, 1, 59, 850000)]
Out[147]: 
2015-05-12 00:00:14.348    109.801
2015-05-12 00:00:36.125    109.994
2015-05-12 00:00:57.599    109.603
2015-05-12 00:01:14.576    100.200
Name: azi, dtype: float64

这是关于你的子弹。大约1)你可以从@Jeff

看到解释

答案 1 :(得分:0)

doc-string说明了一切。

between_time选择所有时间。

In [67]: mydf.between_time?
Signature: mydf.between_time(start_time, end_time, include_start=True, include_end=True)
Docstring:
Select values between particular times of the day (e.g., 9:00-9:30 AM)

Parameters
----------
start_time : datetime.time or string
end_time : datetime.time or string
include_start : boolean, default True
include_end : boolean, default True

Returns
-------
values_between_time : type of caller
File:      ~/pandas/pandas/core/generic.py
Type:      instancemethod

In [68]: mydf
Out[68]: 
                             azi      ele
2015-05-12 00:00:14.348  109.801  180.001
2015-05-12 00:00:36.125  109.994  179.999
2015-05-12 00:00:57.599  109.603  179.999
2015-05-12 00:01:14.576  100.200  180.001

In [70]: mydf.between_time('00:00:30','00:01:00')
Out[70]: 
                             azi      ele
2015-05-12 00:00:36.125  109.994  179.999
2015-05-12 00:00:57.599  109.603  179.999

您可以单独使用partial-string索引,请参阅here根据日期进行选择(这些可以是字符串或日期时间)。

In [73]: mydf.loc['20150512 00:00:30':'20150512 00:01:00']
Out[73]: 
                             azi      ele
2015-05-12 00:00:36.125  109.994  179.999
2015-05-12 00:00:57.599  109.603  179.999

我认为.between_time实际上应该在非.time /字符串可转换对象上引发,但是IIRC这样做是为了便于实现。