我有一个带有两个'datetime'列t1,t2的pandas数据框。现在我需要过滤掉数据框中t1< = t2的所有行 t2可能是Nan
熊猫0.19.0之前 我能做到这一点:import pandas as pd
from datetime import datetime
dt = datetime.utcnow()
dt64 = np.datetime64(dt)
df = pd.DataFrame([(dt64,None)], columns=['t1','t2'])
df[(df.t1<=df.t2)]
在pandas 0.19.0后,此代码失败
Traceback (most recent call last):
File "workspace/python/MyTests/test1.py", line 87, in <module>
testDfTimeCompare()
File "workspace/python/MyTests/test1.py", line 80, in testDfTimeCompare
df[(df.t1<=df.t2)]
File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 813, in wrapper
return self._constructor(na_op(self.values, other.values),
File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 787, in na_op
y = y.view('i8')
File "anaconda/lib/python2.7/site-packages/numpy/core/_internal.py", line 367, in _view_is_safe
raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.
实现这一目标的最佳方式是什么。
答案 0 :(得分:2)
我认为您需要将广告t2
to_datetime
转换为None
到NaT
,然后才能使用更快的函数Series.le
与{{1}相同}}:
<=
df.t2 = pd.to_datetime(df.t2)
print (df)
t1 t2
0 2016-11-04 07:24:53.372838 NaT
mask = df.t1.le(df.t2)
print (mask)
0 False
dtype: bool
答案 1 :(得分:0)
像这样做一些面具:
mask = ((df <= 0).cumsum() > 0).any()
>>> mask
t1 False
t2 True
dtype: bool
答案 2 :(得分:0)
我通过显式设置相关列的类型来解决此问题。
df.t1=df.t1.astype(datetime)
df.t2=df.t2.astype(datetime)
>>> df[(df.t1<=df.t2)]
Empty DataFrame
Columns: [t1, t2]
Index: []
>>> df
t1 t2
0 2020-02-29 11:00:18.825597 None
我正在使用熊猫0.19.2。