我必须找到python数据框中两个日期列之间的差异,并比较差异是否大于120
如果working_data ['CLAIMS_EVENT_DATE']-working_data ['LAST_LAPSED_DATE']> 120:
我低于错误
invalid_comparison .format(dtype = left.dtype,typ = type(right)。名称))
TypeError:dtype = timedelta64 [ns]与int之间的无效比较
答案 0 :(得分:0)
如果两个比较均获得timedelta,则可以比较2个解决方案-如果需要测试至少一个值是否符合条件,则将Series.dt.days
与Series.any
的天数进行比较:
s = (working_data['CLAIMS_EVENT_DATE'] - working_data['LAST_LAPSED_DATE'])
if (s.dt.days > 120).any():
print ('At least one value is higher')
或通过Timedelta
进行比较:
if (s > pd.Timedelta(120, unit='d')).any():
print ('At least one value is higher')
如果需要更合适的行,请使用boolean indexing
:
df = working_data[s.dt.days > 120]
或者:
df = working_data[s > pd.Timedelta(120, unit='d')]
答案 1 :(得分:0)
#Convert both columns to datetime format
working_data['CLAIMS_EVENT_DATE'] = pd.to_datetime(working_data['CLAIMS_EVENT_DATE'])
working_data['LAST_LAPSED_DATE'] = pd.to_datetime(working_data['LAST_LAPSED_DATE'])
#Calculate the difference between the days
working_data['Days'] = (working_data['LAST_LAPSED_DATE']
- working_data['CLAIMS_EVENT_DATE']).days
#Create a column 'Greater' and check whether difference is greater than 120 or not
working_data.loc[working_data.Days <= 120, 'Greater'] = 'False'
working_data.loc[working_data.Days > 120, 'Greater'] = 'True'