我试图通过使用自定义函数 (datetime
) 来区分 Pandas 数据框列和 years_between
对象,这里是 Pandas 数据框的样子:
input_1['dataadmissao'].head(5)
0 2018-02-10
1 2009-08-23
2 2015-05-21
3 2016-12-17
4 2019-02-01
Name: dataadmissao, dtype: datetime64[ns]
这是我的代码:
###################### function to return difference in years ####################
def years_between(start_year, end_year):
start_year = datetime.strptime(start_year, "%d/%m/%Y")
end_year = datetime.strptime(end_year, "%d/%m/%Y")
return abs(end_year.year - start_year.year)
input_1['difference_in_years'] = np.vectorize(years_between(input_1['dataadmissao'], datetime.now()))
哪个返回:
<块引用>TypeError: strptime() 参数 1 必须是 str,而不是 Series
如何调整函数以返回一个整数,该整数表示 Pandas 数据框列和 datetime.now()
之间的年数差异?
答案 0 :(得分:1)
>>> df
0 2018-02-10
1 2009-08-23
2 2015-05-21
3 2016-12-17
4 2019-02-01
Name: 1, dtype: datetime64[ns]
>>> pd.Timestamp.now() - df
0 1089 days 02:41:50.467993
1 4182 days 02:41:50.467993
2 2085 days 02:41:50.467993
3 1509 days 02:41:50.467993
4 733 days 02:41:50.467993
Name: 1, dtype: timedelta64[ns]
# If you want days
>>> (pd.Timestamp.now() - df).dt.days
0 1089
1 4182
2 2085
3 1509
4 733
Name: 1, dtype: int64
# If you want years
>>> (pd.Timestamp.now().year - df.dt.year)
0 3
1 12
2 6
3 5
4 2
Name: 1, dtype: int64
答案 1 :(得分:1)
只需从 datetime.datetime.now()
中减去系列,除以一年的持续时间,然后转换为整数:
import numpy as np
((datetime.now() - input_1['dataadmissao'])/np.timedelta64(1, 'Y')).astype(int)