熊猫根据多个日期时间列选择行

时间:2020-03-12 10:09:20

标签: python pandas dataframe

我有两列EndTime+------------+--------------------------------+-------------------------------+ | | StartTime | EndTime | +------------+--------------------------------+-------------------------------+ | 25 | 2018-05-17 11:52:21.769491600 | 2018-05-17 23:08:35.731376400 | | 32 | 2018-05-19 14:22:24.141359000 | 2018-05-19 18:37:04.003643800 | | 42 | 2018-05-22 08:25:01.015975500 | 2018-05-22 22:32:34.249869500 | | 43 | 2018-05-22 08:46:06.187427200 | 2018-05-22 21:29:17.397438000 | | 44 | 2018-05-22 13:38:37.289871700 | 2018-05-22 18:38:36.498623500 | +------------+--------------------------------+-------------------------------+ ,我需要选择7-9和18-20之间发生的事件。到目前为止,我尝试过的是:

df = df[((df['start_hr']<=7) & (df['end_hr']>=9)) | ((df['start_hr']<=18) & (df['end_hr']>=20))]

我从数据中提取了小时数,并用它们来计算跟踪次数

{{1}}

是否有更准确,更快速的替代方法?

2 个答案:

答案 0 :(得分:1)

这会增加一段时间的内存消耗,但是您可以执行以下操作,在其中创建两个临时列并在它们上使用“ df.query”。确保稍后删除列。

df = df.assign(start_hr=df.start_hr.dt.hour, end_hr=df.end_hr.dt.hour)

df.query('(start_hr <= 7  and end_hr >=9) or (start_hr <= 18  and end_hr >=20) ')

答案 1 :(得分:0)

您可以使用此:


df['start_hr'] = pd.to_datetime(df['start_hr']) 
df['end_hr'] = pd.to_datetime(df['end_hr'])

df['start_hr_day'] = df['start_hr'].dt.day
df['end_hr_day'] = df['start_hr'].dt.day 

df.loc[((df['start_hr_day']<=7) & (df['end_hr_day']>=9))|((df['start_hr_day']<=18) & (df['end_hr_day']>=20))]