我有一个数据框df
,其头部如下:
identifier department organisation status change date
1 14 Finance Accounts 19/09/2018
2 19 Marketing Advertising 19/09/2016
22 288 Production IT 03/01/2017
27 352 Facilities Kitchen 31/01/2017
54 790 Relations Sales 31/03/2017
df
里面有几千条记录。我还有2个日期变量 - 作为字符串(来自命令行的参数)的引用期的开始日期和结束日期:
referencePeriodStartDate
和referencePeriodEndDate
目前相同:
referencePeriodStartDate = 01/01/2017
referencePeriodEndDate = 30/03/2017
我正在尝试从df
返回并记录状态更改日期超出参考期限的记录referencePeriodStartDate
和referencePeriodEndDate
在上面的示例中,标识为14
和19
的记录将作为状态更改日期返回,它们分别位于参考窗口之前和之后19/09/2018
和19/09/2016
。
示例输出
identifier department organisation status change date
1 14 Finance Accounts 19/09/2018
2 19 Marketing Advertising 19/09/2016
我试过以下
resultdf = (df['status change date'].dt.date > referencePeriodEndDate.dt.date) & (df['status change date'].dt.date < referencePeriodStartDate.dt.date)
我将字符串日期转换为类型日期,如果状态更改日期小于referencePeriodStartDate
且状态更改日期&gt;则尝试应用逻辑。 referencePeriodEndDate
然后返回该行。
我的问题是没有任何回复。我是否错误地转换为输入日期?如果有人可以看看,我会感激不尽
由于
答案 0 :(得分:0)
如果要比较由window.confirm()
function创建的列的日期,并且标量日期需要date
:
df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate = pd.to_datetime('01/01/2017')
referencePeriodEndDate = pd.to_datetime('30/03/2017')
resultdf = df[(df['status change date'].dt.date > referencePeriodEndDate.date()) |
(df['status change date'].dt.date < referencePeriodStartDate.date())]
print (resultdf)
identifier department organisation status change date
1 14 Finance Accounts 2018-09-19
2 19 Marketing Advertising 2016-09-19
54 790 Relations Sales 2017-03-31
或者对于比较日期时间,只需删除日期或使用~
df['status change date'] = pd.to_datetime(df['status change date'])
referencePeriodStartDate = '01/01/2017'
referencePeriodEndDate = '30/03/2017'
resultdf = df[(df['status change date'] > referencePeriodEndDate) |
(df['status change date'] < referencePeriodStartDate)]
print (resultdf)
identifier department organisation status change date
1 14 Finance Accounts 2018-09-19
2 19 Marketing Advertising 2016-09-19
54 790 Relations Sales 2017-03-31
以及mask = ~df['status change date'].between(referencePeriodStartDate, referencePeriodEndDate)
resultdf = df[mask]
print (resultdf)
identifier department organisation status change date
1 14 Finance Accounts 2018-09-19
2 19 Marketing Advertising 2016-09-19
54 790 Relations Sales 2017-03-31
$server_name
$host
答案 1 :(得分:0)
就像Jezrael提到的代码一样,你正在使用&#39;&amp;&#39;来切片。您的日期不能在x&#39;&amp;&#39;之后在&#39; y之前的同一时间。将字符串转换为datetype,然后使用&#39;或&#39;或者&#39; |&#39;