我有一个如下数据框:
name country Join Date End date
Wrt IND 1-2-2016 8-9-2017
Grt China 3-2-2015 12-6-2018
frt France 8-3-2017 continuing
srt Scottland 9-4-2018 continuing
crt china 9-7-2016 7-8-2018
我正在尝试查找加入日期和结束日期之间的差异。我尝试使用f9['Num of days'] = f9['End date '] - f9['Join Date']
,但收到以下错误:
TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'float'
我的预期输出应该是:
name country Join Date End date diff
Wrt IND 1-2-2016 8-9-2017 395
Grt China 3-2-2017 12-6-2018 160
frt France 8-3-2017 continuing continuing
srt Scottland 9-4-2018 continuing continuing
crt china 9-7-2017 7-8-2018 280
答案 0 :(得分:2)
首先使用参数errors='coerce'
将两列都转换为日期时间,以获取缺少值的值,如果日期错误,例如字符串continuing
,并且在必要时还添加参数dayfirst=True
,然后减去值,用{{ 3}}从时间增量开始,最后在必要时用Series.dt.days
替换误码值:
f9['Join Date'] = pd.to_datetime(f9['Join Date'], errors='coerce', dayfirst=True)
f9['End date'] = pd.to_datetime(f9['End date'], errors='coerce', dayfirst=True)
f9['Num of days'] = (f9['End date'] - f9['Join Date']).dt.days.fillna('continuing')
print (f9)
name country Join Date End date Num of days
0 Wrt IND 2016-02-01 2017-09-08 585
1 Grt China 2015-02-03 2018-06-12 1225
2 frt France 2017-03-08 NaT continuing
3 srt Scottland 2018-04-09 NaT continuing
4 crt china 2016-07-09 2018-08-07 759
或者:
f9['Join Date'] = pd.to_datetime(f9['Join Date'], errors='coerce')
f9['End date'] = pd.to_datetime(f9['End date'], errors='coerce')
f9['Num of days'] = (f9['End date'] - f9['Join Date']).dt.days.fillna('continuing')
print (f9)
name country Join Date End date Num of days
0 Wrt IND 2016-01-02 2017-08-09 585
1 Grt China 2015-03-02 2018-12-06 1375
2 frt France 2017-08-03 NaT continuing
3 srt Scottland 2018-09-04 NaT continuing
4 crt china 2016-09-07 2018-07-08 669
最后一步应该是替换丢失的值,但丢失datetime
的列,获取与datetimes
混合的字符串,因此以后类似datetime的函数失败:
f9['End date'] = f9['End date'].fillna('continuing')
print (f9)
name country Join Date End date Num of days
0 Wrt IND 2016-01-02 2017-08-09 00:00:00 585
1 Grt China 2015-03-02 2018-12-06 00:00:00 1375
2 frt France 2017-08-03 continuing continuing
3 srt Scottland 2018-09-04 continuing continuing
4 crt china 2016-09-07 2018-07-08 00:00:00 669
编辑:
您可以从顶部或底部添加多个条件,也可以使用Series.fillna
函数:
f9['Join Date'] = pd.to_datetime(f9['Join Date'], errors='coerce')
f9['End date'] = pd.to_datetime(f9['End date'], errors='coerce')
f9['Num of days'] = (f9['End date'] - f9['Join Date']).dt.days
m1 = f9['Num of days'] > 730
m2 = f9['Num of days'].between(365, 730)
m3 = f9['Num of days'] < 365
m4 = f9['Num of days'].isna()
f9['Status'] = np.select([m1, m2, m3,m4], ['U','L', 'N','EOL'])
f9[['End date','Num of days']] = f9[['End date','Num of days']].fillna('continuing')
print (f9)
name country Join Date End date Num of days Status
0 Wrt IND 2016-01-02 2017-08-09 00:00:00 585 L
1 Grt China 2015-03-02 2018-12-06 00:00:00 1375 U
2 frt France 2017-08-03 continuing continuing EOL
3 srt Scottland 2018-09-04 continuing continuing EOL
4 crt china 2016-09-07 2018-07-08 00:00:00 669 L
另一个想法是使用Series.between
进行装箱:
f9['Join Date'] = pd.to_datetime(f9['Join Date'], errors='coerce')
f9['End date'] = pd.to_datetime(f9['End date'], errors='coerce')
f9['Num of days'] = (f9['End date'] - f9['Join Date']).dt.days
f9['Status']=pd.cut(f9['Num of days'],bins=[-np.inf, 365, 730, np.inf],labels=['U','L', 'N'])
f9['Status'] = f9['Status'].cat.add_categories(['EOL']).fillna('EOL')
f9[['End date','Num of days']] = f9[['End date','Num of days']].fillna('continuing')
print (f9)
name country Join Date End date Num of days Status
0 Wrt IND 2016-01-02 2017-08-09 00:00:00 585 L
1 Grt China 2015-03-02 2018-12-06 00:00:00 1375 N
2 frt France 2017-08-03 continuing continuing EOL
3 srt Scottland 2018-09-04 continuing continuing EOL
4 crt china 2016-09-07 2018-07-08 00:00:00 669 L
答案 1 :(得分:1)
首先使用to_datetime转换日期中的两列
然后使用.dt.date
df = pd.DataFrame(data={'name':['wrt','grt','frt'],
'country':['ind','china','france'],
'join_date':['1-2-2016','3-2-2015','8-3-2017'],
'end_date':['8-9-2017','12-6-2018','continuing']})
df['join_date'] = pd.to_datetime(df['join_date'],errors='coerce').dt.date
df['end_date'] = pd.to_datetime(df['end_date'],errors='coerce').dt.date
df['diff'] = (df['end_date'] - df['join_date']).dt.days
df = df[['join_date','end_date','diff']].fillna('continuing')
print(df)
答案 2 :(得分:0)
在这里您可以做的是将“加入日期”和“结束日期”系列转换为numpy数组,并为此dtype = np.datetime64进行比较,然后将差值数组存储到数据帧中。 还要用您要填写的任何日期的当前数据时间填写“连续”单元格。(取决于您的情况)
答案 3 :(得分:0)
这是可以在jupyter笔记本中运行的可行解决方案。
35