过滤日期范围之外的行日期的数据框

时间:2017-09-22 10:55:37

标签: python pandas

我有一个数据框df,其头部如下:

    identifier        department    organisation    status change date
1           14           Finance        Accounts            19/09/2018
2           19         Marketing     Advertising            19/09/2016
22         288        Production              IT            03/01/2017
27         352        Facilities         Kitchen            31/01/2017
54         790         Relations           Sales            31/03/2017

df里面有几千条记录。我还有2个日期变量 - 作为字符串(来自命令行的参数)的引用期的开始日期和结束日期:

referencePeriodStartDatereferencePeriodEndDate

目前相同:

referencePeriodStartDate = 01/01/2017
referencePeriodEndDate = 30/03/2017

我正在尝试从df返回并记录状态更改日期超出参考期限的记录referencePeriodStartDatereferencePeriodEndDate

在上面的示例中,标识为1419的记录将作为状态更改日期返回,它们分别位于参考窗口之前和之后19/09/201819/09/2016

示例输出

    identifier        department    organisation    status change date
1           14           Finance        Accounts            19/09/2018
2           19         Marketing     Advertising            19/09/2016

我试过以下

resultdf = (df['status change date'].dt.date > referencePeriodEndDate.dt.date) & (df['status change date'].dt.date < referencePeriodStartDate.dt.date)

我将字符串日期转换为类型日期,如果状态更改日期小于referencePeriodStartDate且状态更改日期&gt;则尝试应用逻辑。 referencePeriodEndDate然后返回该行。

我的问题是没有任何回复。我是否错误地转换为输入日期?如果有人可以看看,我会感激不尽

由于

2 个答案:

答案 0 :(得分:0)

如果要比较由window.confirm() function创建的列的日期,并且标量日期需要date

df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate =  pd.to_datetime('01/01/2017')
referencePeriodEndDate = pd.to_datetime('30/03/2017')

resultdf = df[(df['status change date'].dt.date > referencePeriodEndDate.date()) | 
              (df['status change date'].dt.date < referencePeriodStartDate.date())]
print (resultdf)
    identifier department organisation status change date
1           14    Finance     Accounts         2018-09-19
2           19  Marketing  Advertising         2016-09-19
54         790  Relations        Sales         2017-03-31

或者对于比较日期时间,只需删除日期或使用~ df['status change date'] = pd.to_datetime(df['status change date']) referencePeriodStartDate = '01/01/2017' referencePeriodEndDate = '30/03/2017' resultdf = df[(df['status change date'] > referencePeriodEndDate) | (df['status change date'] < referencePeriodStartDate)] print (resultdf) identifier department organisation status change date 1 14 Finance Accounts 2018-09-19 2 19 Marketing Advertising 2016-09-19 54 790 Relations Sales 2017-03-31 以及mask = ~df['status change date'].between(referencePeriodStartDate, referencePeriodEndDate) resultdf = df[mask] print (resultdf) identifier department organisation status change date 1 14 Finance Accounts 2018-09-19 2 19 Marketing Advertising 2016-09-19 54 790 Relations Sales 2017-03-31

$server_name
$host

答案 1 :(得分:0)

就像Jezrael提到的代码一样,你正在使用&#39;&amp;&#39;来切片。您的日期不能在x&#39;&amp;&#39;之后在&#39; y之前的同一时间。将字符串转换为datetype,然后使用&#39;或&#39;或者&#39; |&#39;