Question

我有一个df，

doc_date    date_string
2019-06-03  WW0306
2019-06-07  EH0706
2019-08-08  19685
2019-08-09  258
2019-08-10  441573556

doc_date为dateimte64类型，date_string为string，除去非数字字符，

s = df['date_string'].str.replace(r'\D+', '')

doc_date    date_string
2019-06-03  0306
2019-06-07  0706
2019-08-08  19685
2019-08-09  258
2019-08-10  441573556

s1 = to_datetime(s, errors='ignore', format='%d%m')

doc_date    date_string
2019-06-03  1900-06-03
2019-06-07  1900-06-07
2019-08-08  19685
2019-08-09  258
2019-08-10  441573556

这里，我想知道如何忽略那些无法将date_string转换为日期时间的行；所以我可以创建一个布尔掩码，

 c1 = (df.doc_date.dt.dayofyear - s1.dt.dayofyear).abs().le(180)

另一件事是如何使c1与s具有相同的长度，使得任何无法转换为date_string的{{1}}都将在{ {1}}；

Answer 1

使用errors='coerce'将不匹配的模式值转换为NaT以使用类似日期时间的函数：

s1 = to_datetime(s, errors='coerce', format='%d%m')

或更常用的（熊猫0.24.2，所以输出不同）：

import pandas as pd

s1 = pd.to_datetime(s, errors='coerce', format='%d%m')
print (s1)
0   1900-06-03
1   1900-06-07
2          NaT
3   1900-08-25
4          NaT
Name: date_string, dtype: datetime64[ns]

一起：

#if necessary
#df['doc_date'] =  pd.to_datetime(df['doc_date'])

s = df['date_string'].str.replace(r'\D+', '')

s1 = pd.to_datetime(s, errors='coerce', format='%d%m')

c1 = (df.doc_date.dt.dayofyear - s1.dt.dayofyear).abs().le(180)
print (c1)
0     True
1     True
2    False
3     True
4    False
dtype: bool

熊猫如何忽略无法转换为日期时间以计算时间增量的列单元格

1 个答案: