计算熊猫数据框列和日期时间对象之间的时间差

时间:2021-02-02 20:57:09

标签: python pandas dataframe datetime

我试图通过使用自定义函数 (datetime) 来区分 Pandas 数据框列和 years_between 对象,这里是 Pandas 数据框的样子:

input_1['dataadmissao'].head(5)

0   2018-02-10
1   2009-08-23
2   2015-05-21
3   2016-12-17
4   2019-02-01
Name: dataadmissao, dtype: datetime64[ns]

这是我的代码:

###################### function to return difference in years ####################

def years_between(start_year, end_year):
    start_year = datetime.strptime(start_year, "%d/%m/%Y")
    end_year = datetime.strptime(end_year, "%d/%m/%Y")
    return abs(end_year.year - start_year.year)

input_1['difference_in_years'] = np.vectorize(years_between(input_1['dataadmissao'], datetime.now()))

哪个返回:

<块引用>

TypeError: strptime() 参数 1 必须是 str,而不是 Series

如何调整函数以返回一个整数,该整数表示 Pandas 数据框列和 datetime.now() 之间的年数差异?

2 个答案:

答案 0 :(得分:1)

使用pandas.Timestamp.now

>>> df
0   2018-02-10
1   2009-08-23
2   2015-05-21
3   2016-12-17
4   2019-02-01
Name: 1, dtype: datetime64[ns]

>>> pd.Timestamp.now() - df

0   1089 days 02:41:50.467993
1   4182 days 02:41:50.467993
2   2085 days 02:41:50.467993
3   1509 days 02:41:50.467993
4    733 days 02:41:50.467993
Name: 1, dtype: timedelta64[ns]

# If you want days
>>> (pd.Timestamp.now() - df).dt.days
0    1089
1    4182
2    2085
3    1509
4     733
Name: 1, dtype: int64

# If you want years
>>> (pd.Timestamp.now().year - df.dt.year)
0     3
1    12
2     6
3     5
4     2
Name: 1, dtype: int64

答案 1 :(得分:1)

只需从 datetime.datetime.now() 中减去系列,除以一年的持续时间,然后转换为整数:

import numpy as np
((datetime.now() - input_1['dataadmissao'])/np.timedelta64(1, 'Y')).astype(int)