基于多个日期条件过滤数据框

时间:2021-07-19 08:00:28

标签: python pandas datetime

我正在使用以下 DataFrame:

id  slotTime    EDD EDD-10M
0   1000000101068957    2021-05-12  2021-12-26  2021-02-26
1   1000000100849718    2021-03-20  2021-04-05  2020-06-05
2   1000000100849718    2021-03-20  2021-04-05  2020-06-05
3   1000000100849718    2021-03-20  2021-04-05  2020-06-05
4   1000000100849718    2021-03-20  2021-04-05  2020-06-05

我只想保留 slotTime 位于 EDD-10MEDD 之间的行:

df['EDD-10M'] < df['slotTime'] < df['EDD']]

我尝试过使用以下方法:

df.loc[df[df['slotTime'] < df['EDD']] & df[df['EDD-10M'] < df['slotTime']]]

但是它产生以下错误

TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

请指教。

要复制上述 DataFrame,请使用以下代码段:

import pandas as pd
from pandas import Timestamp

df = { 
  'id': {0: 1000000101068957,
  1: 1000000100849718,
  2: 1000000100849718,
  3: 1000000100849718,
  4: 1000000100849718,
  5: 1000000100849718,
  6: 1000000100849718,
  7: 1000000100849718,
  8: 1000000100849718,
  9: 1000000100849718},
  'EDD': {0: Timestamp('2021-12-26 00:00:00'),
  1: Timestamp('2021-04-05 00:00:00'),
  2: Timestamp('2021-04-05 00:00:00'),
  3: Timestamp('2021-04-05 00:00:00'),
  4: Timestamp('2021-04-05 00:00:00'),
  5: Timestamp('2021-04-05 00:00:00'),
  6: Timestamp('2021-04-05 00:00:00'),
  7: Timestamp('2021-04-05 00:00:00'),
  8: Timestamp('2021-04-05 00:00:00'),
  9: Timestamp('2021-04-05 00:00:00')},
 'EDD-10M': {0: Timestamp('2021-02-26 00:00:00'),
  1: Timestamp('2020-06-05 00:00:00'),
  2: Timestamp('2020-06-05 00:00:00'),
  3: Timestamp('2020-06-05 00:00:00'),
  4: Timestamp('2020-06-05 00:00:00'),
  5: Timestamp('2020-06-05 00:00:00'),
  6: Timestamp('2020-06-05 00:00:00'),
  7: Timestamp('2020-06-05 00:00:00'),
  8: Timestamp('2020-06-05 00:00:00'),
  9: Timestamp('2020-06-05 00:00:00')},
 'slotTime': {0: Timestamp('2021-05-12 00:00:00'),
  1: Timestamp('2021-03-20 00:00:00'),
  2: Timestamp('2021-03-20 00:00:00'),
  3: Timestamp('2021-03-20 00:00:00'),
  4: Timestamp('2021-03-20 00:00:00'),
  5: Timestamp('2021-03-20 00:00:00'),
  6: Timestamp('2021-03-20 00:00:00'),
  7: Timestamp('2021-03-20 00:00:00'),
  8: Timestamp('2021-03-20 00:00:00'),
  9: Timestamp('2021-03-20 00:00:00')}}

df = pd.DataFrame(df)

3 个答案:

答案 0 :(得分:5)

你只需要把你的两边分组

df[(df['slotTime'] < df['EDD']) & (df['EDD-10M'] < df['slotTime'])]

否则操作顺序会先尝试 & 事情,然后一切都会崩溃

或者,您可能希望使用 .between 运算符(假设您有一个日期时间序列

df[df['slotTime'].between(df['EDD'],df['EDD-10M'])]

答案 1 :(得分:1)

你可以使用已经有人回答过你的 between() 方法或者像这样尝试

df.loc[(df['EDD-10M'] < df['slotTime']) & (df['slotTime'] < df['EDD'])]

你应该使用(和)多个条件

答案 2 :(得分:1)

您可以使用 query

df.query("(slotTime < EDD) & (`EDD-10M` < slotTime)")