Question

我是python和pandas的新手。我有一个datetime索引数据帧。我想选择时间为＆gt;的行。 08:00:00 我尝试使用pd.DataFrame.select函数。它失败了，因为索引有重复的条目。

我正确地尝试了吗？

有办法吗？

使用重复条目索引数据是不是一种坏习惯吗？

>>> df.head(10)
                            A
time                         
1900-01-01 00:01:01.456170  0
1900-01-01 00:01:01.969600  0
1900-01-01 00:01:04.305494  0
1900-01-01 00:01:13.860365  0
1900-01-01 00:01:19.666371  0
1900-01-01 00:01:24.920744  0
1900-01-01 00:01:24.931466  0
1900-01-01 00:02:07.522741  0
1900-01-01 00:02:13.857793  0
1900-01-01 00:02:34.817765 -7
>>> timeindexvalid = lambda x : x.to_datetime() > datetime(1900, 1, 1, 8)
>>> df.select(timeindexvalid)
Traceback (most recent call last):

    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

Answer 1

您可以使用表达式选择所需的索引，而无需使用select()：

In [1]: df
Out[1]:
            A
time
2012-05-01  0
2012-05-02  1
2012-05-02  2

In [2]: df.index
Out[2]:
<class 'pandas.tseries.index.DatetimeIndex'>

In [3]: df.index.is_unique
Out[3]: False

In [4]: df[df.index > datetime(2012,5,1)]
Out[4]:
            A
time
2012-05-02  1
2012-05-02  2

使用select：

复制错误

In [5]: sel = lambda x: x > datetime(2012,5,1)

In [6]: df.select(sel)
Exception: Reindexing only valid with uniquely valued Index objects

Answer 2

我在GitHub上做了一个注释，使用between_time方法更轻松地支持这一点：

https://github.com/pydata/pandas/issues/2826

Answer 3

您可以使用indexer_between_time（在午夜过后1分钟到过去2分钟之间）：

In [11]: df1.iloc[df1.index.indexer_between_time('00:01:00', '00:02:00')]
Out[11]:
                            A
time
1900-01-01 00:01:01.456170  0
1900-01-01 00:01:01.969600  0
1900-01-01 00:01:04.305494  0
1900-01-01 00:01:13.860365  0
1900-01-01 00:01:19.666371  0
1900-01-01 00:01:24.920744  0
1900-01-01 00:01:24.931466  0

从具有重复索引的数据框中进行选择

3 个答案: