Question

我有一个Python数据框，其中包含带有日期时间的列，像这样 2019-01-02 09:00:00（这意味着2019年1月2日上午9点）

在“日期时间”列中可能会有很多行具有相同的日期。

换句话说，我可以有2019-01-02 09:00:00或2019-01-02 09:15:00或2019-01-02 09:30:00等等。

现在，我需要在Python数据框中找到日期为2019-01-02的第一次出现的行索引。

我显然是使用循环来执行此操作的，但是我想知道是否有更好的方法。

使用df['Date Time'].str.contains()方法，我可以得到与给定日期匹配的所有行，但我需要索引。

普遍的问题是，我们如何在匹配给定字符串模式的Python数据框中的单元格中找到首次匹配项的索引。

更具体的问题是，我们如何在Python数据框中与包含日期时间的单元格中给定日期匹配的Python数据框中找到首次匹配项的索引，假设Python数据框以日期Time的时间顺序升序排序，即 2019-01-02 09:00:00的索引早于2019-01-02 09:15:00，然后是2019-01-03 09:00:00，依此类推。

谢谢您的投入

Answer 1

您可以将next与iter一起用于第一个索引值匹配条件，以防止在没有匹配值的情况下防止失败：

df = pd.DataFrame({'dates':pd.date_range(start='2018-01-01 20:00:00',
                                         end='2018-01-02 02:00:00', freq='H')})
print (df)
                dates
0 2018-01-01 20:00:00
1 2018-01-01 21:00:00
2 2018-01-01 22:00:00
3 2018-01-01 23:00:00
4 2018-01-02 00:00:00
5 2018-01-02 01:00:00
6 2018-01-02 02:00:00

date = '2018-01-02'
mask = df['dates'] >= date
idx = next(iter(mask.index[mask]), 'not exist')
print (idx)
4


date = '2018-01-08'
mask = df['dates'] >= date
idx = next(iter(mask.index[mask]), 'not exist')
print (idx)
not exist

如果性能很重要，请参见Efficiently return the index of the first value satisfying condition in array。

Answer 2

是的，您可以使用.loc和条件来切片df，然后使用.iloc返回索引。

import pandas as pd
df = pd.DataFrame({'time':pd.date_range(start='2018-01-01 00:00:00',end='2018-12-31 00:00:00', freq='H')}, index=None).reset_index(drop=True)

# then use conditions and .iloc to get the first instance
df.loc[df['time']>'2018-10-30 01:00:00'].iloc[[0,]].index[0]

# if you specify a coarser condition, for instance without time,
# it will also return the first instance
df.loc[df['time']>'2018-10-30'].iloc[[0,]].index[0]

Answer 3

我不知道这是否是最佳选择，但它可以正常工作

(df['Date Time'].dt.strftime('%Y-%m-%d') == '2019-01-02').idxmax()

如何在Python数据框中的单元格中查找首次匹配项的行索引（包含日期）

3 个答案: