我有一个pandas DataFrame,它有一个列,“date_col”带有日期字符串。我想过滤所有行的DataFrame,如果由ValueError
解析,此列中的日期字符串将抛出numpy.datetime64
。我正在寻找以下内容:
bad_rows = df[numpy.datetime64(df["date_col"]) is False]
除了检查False
之外,我还想检查ValueError
是否被引发。有没有办法在pandas DataFrame中进行这种类型的过滤?
我尝试执行以下操作:
df = pd.DataFrame({"date_col":("2015-04-31", "2015-04-30")})
result = pd.to_datetime(df["date_col"], errors='coerce')
但我明白了:
>>> result
0 2015-04-31
1 2015-04-30
检查每个值的类型表明它们仍然是字符串。
>>> result[0]
'2015-04-31'
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
date_col 2 non-null object
dtypes: object(1)
如果我尝试:
>>> result = pd.to_datetime(df["date_col"], errors='coerce' ,format='%Y%m%d')
我明白了:
Traceback (most recent call last):
File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 330, in _convert_listlike
values, tz = tslib.datetime_to_datetime64(arg)
File "pandas/tslib.pyx", line 1371, in pandas.tslib.datetime_to_datetime64 (pandas/tslib.c:23790)
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 340, in to_datetime
values = _convert_listlike(arg.values, False, format)
File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 333, in _convert_listlike
raise e
File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 307, in _convert_listlike
arg, format, exact=exact, coerce=coerce
File "pandas/tslib.pyx", line 2347, in pandas.tslib.array_strptime (pandas/tslib.c:39562)
ValueError: time data '2015-04-31' does not match format '%Y%m%d' (match)
我的熊猫版本是0.16.1,我的numpy版本是1.9.2。
这适用于(对于pandas 0.16.1):
df = pd.DataFrame({"date_col":("2015-04-31", "2015-04-30")})
>>> pd.to_datetime(df['date_col'], coerce=True)
0 NaT
1 2015-04-30
Name: date_col, dtype: datetime64[ns]
>>> pd.to_datetime(df['date_col'], coerce=True).isnull()
0 True
1 False
Name: date_col, dtype: bool
答案 0 :(得分:1)
只需执行pd.to_datetime(df['date_col'], errors='coerce')
这会产生字符串无效的NaT
示例:
In [307]:
df = pd.DataFrame({'date':['2015-02-01', 'sausage', '2011-01-33']})
df
Out[307]:
date
0 2015-02-01
1 sausage
2 2011-01-33
In [308]:
pd.to_datetime(df['date'], errors='coerce')
Out[308]:
0 2015-02-01
1 NaT
2 NaT
Name: date, dtype: datetime64[ns]
对isnull()
的后续调用将产生True
,其中值无效:
In [309]:
pd.to_datetime(df['date'], errors='coerce').isnull()
Out[309]:
0 False
1 True
2 True
Name: date, dtype: bool
修改强>
看到你正在使用0.16.1
api有点不同,以下内容应该有效:
result= pd.to_datetime(df['date_col'], coerce=True)