我有两列。我想检查它们之间的差异是否介于0到10天之间。其中一个字段通常包含空值。
df['Diff'] = (df['Dt1'] - df['Dt2'])
def wdw(x):
if pd.notnull(x):
if type(x) !=long:
if type(timedelta(days=10)) != long:
if x > timedelta(days=10):
return 1
else:
return 0
df['Diff'].df(wdw)
当我运行此操作时,出现以下错误。
TypeError: can't compare datetime.timedelta to long
当我看到df ['Diff']的值时,它们似乎都是timedeltas。知道这里发生了什么吗?似乎根据两个日期字段之间的差异创建一个指标应该比这更容易......
答案 0 :(得分:1)
df['Diff']
中的值为numpy timedelta64s。您可以将它们与pd.Timedelta
s进行比较;见下文。
此外,您无需致电df['Diff'].apply(wdw)
,系列中的每个值都会调用wdw
;您可以将整个系列与pd.Timedelta
进行比较:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Dt1':pd.date_range('2010-1-1', freq='5D', periods=10),
'Dt2':pd.date_range('2010-1-2', freq='3D', periods=10)})
df.iloc[::3, 1] = np.nan
df['Diff'] = df['Dt1'] - df['Dt2']
print(df)
# Dt1 Dt2 Diff
# 0 2010-01-01 NaT NaT
# 1 2010-01-06 2010-01-05 1 days
# 2 2010-01-11 2010-01-08 3 days
# 3 2010-01-16 NaT NaT
# 4 2010-01-21 2010-01-14 7 days
# 5 2010-01-26 2010-01-17 9 days
# 6 2010-01-31 NaT NaT
# 7 2010-02-05 2010-01-23 13 days
# 8 2010-02-10 2010-01-26 15 days
# 9 2010-02-15 NaT NaT
mask = (df['Diff'] < pd.Timedelta(days=10)) & (pd.Timedelta(days=0) < df['Diff'])
print(mask)
产量
0 False
1 True
2 True
3 False
4 True
5 True
6 False
7 False
8 False
9 False
Name: Diff, dtype: bool
pd.Timedelta
。以下是使用np.timedela64s
的旧版Pandas的解决方法:
mask = ((df['Diff'] / np.timedelta64(10, 'D') < 10)
& (df['Diff'] / np.timedelta64(10, 'D') > 0))
答案 1 :(得分:1)
这也有效,但不如unutbu提供的答案好。
def wdw(x):
if pd.notnull(x):
if x/np.timedelta64(1,'D') <= 10:
if x/np.timedelta64(1,'D') >0:
return 1
else:
return 0
df['Diff'].df(wdw)
答案 2 :(得分:0)
使用assign 创建差异日期dt1 和dt2 列。然后使用 timedelta 获取 0 天和 10 天变量来比较差异,然后屏蔽输出结果。
df = pd.DataFrame({'Dt1':pd.date_range('2010-1-1', freq='5D', periods=10),
'Dt2':pd.date_range('2010-1-2', freq='3D', periods=10)})
df.iloc[::3, 1] = np.nan
print(df)
zero_days=timedelta(days=0)
ten_days=timedelta(days=10)
print(zero_days,ten_days)
df['Diff']=np.empty(len(df))
df=df.assign(Diff=lambda row: row['Dt1']-row['Dt2'])
mask=(df['Diff'] >=zero_days)&(df['Diff'] <=ten_days)
print(df[mask])
输出:
Dt1 Dt2 Diff
1 2010-01-06 2010-01-05 1 days
2 2010-01-11 2010-01-08 3 days
4 2010-01-21 2010-01-14 7 days
5 2010-01-26 2010-01-17 9 days