从python / pandas中的日期/时间格式计算年龄

时间:2017-10-01 03:16:33

标签: pandas

在python中寻找一种从以下日期/时间格式计算年龄的方法。

例如:1956-07-01T00:00:00Z

我编写了一个代码来执行此操作,方法是提取字符串的四个字符,将其转换为int并从2017中减去它,但是想看看是否有一种有效的方法。

3 个答案:

答案 0 :(得分:9)

这是你想要的吗?

(pd.to_datetime('today').year-pd.to_datetime('1956-07-01').year)

Out[83]: 61

答案 1 :(得分:2)

我通过timedelta对象将天数除以365.25

(pd.to_datetime('today') - pd.to_datetime('1956-07-01')).days / 365.25

61.24845995893224

答案 2 :(得分:0)

如果下面有不正常的年份(例如1601),pd.to_datetime将会出错。

import pandas as pd

(pd.to_datetime('today').year-pd.to_datetime('1601-07-01').year)

# Traceback (most recent call last):
#   File "/home/kuroyanagi/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 444, in _convert_listlike
#     values, tz = tslib.datetime_to_datetime64(arg)
#   File "pandas/_libs/tslib.pyx", line 1810, in pandas._libs.tslib.datetime_to_datetime64 (pandas/_libs/tslib.c:33275)
# TypeError: Unrecognized value type: <class 'str'>
# During handling of the above exception, another exception occurred:
# Traceback (most recent call last):
#   File "/home/kuroyanagi/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
#     exec(code_obj, self.user_global_ns, self.user_ns)
#   File "<ipython-input-45-829e219d9060>", line 1, in <module>
#     (pd.to_datetime('today').year-pd.to_datetime('1601-07-01').year)
#   File "/home/kuroyanagi/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 518, in to_datetime
#     result = _convert_listlike(np.array([arg]), box, format)[0]
#   File "/home/kuroyanagi/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 447, in _convert_listlike
#     raise e
#   File "/home/kuroyanagi/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 435, in _convert_listlike
#     require_iso8601=require_iso8601
#   File "pandas/_libs/tslib.pyx", line 2355, in pandas._libs.tslib.array_to_datetime (pandas/_libs/tslib.c:46617)
#   File "pandas/_libs/tslib.pyx", line 2538, in pandas._libs.tslib.array_to_datetime (pandas/_libs/tslib.c:45511)
#   File "pandas/_libs/tslib.pyx", line 2506, in pandas._libs.tslib.array_to_datetime (pandas/_libs/tslib.c:44978)
#   File "pandas/_libs/tslib.pyx", line 2500, in pandas._libs.tslib.array_to_datetime (pandas/_libs/tslib.c:44859)
#   File "pandas/_libs/tslib.pyx", line 1517, in pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:28598)
#   File "pandas/_libs/tslib.pyx", line 1774, in pandas._libs.tslib._check_dts_bounds (pandas/_libs/tslib.c:32752)
# pandas._libs.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1601-07-01 00:00:00

对于包含不规则年份的数据,您可以按如下方式计算。

import numpy as np
import pandas as pd

date = pd.Series(['1601-07-01', '1956-07-01'])

def elasped_years(date):
    reference_year = pd.to_datetime('today').year
    reference_month = pd.to_datetime('today').month
    year = date.str.slice(0, 4).astype(np.float)
    month = date.str.slice(5, 7).astype(np.float)
    duration = np.floor((12 * (reference_year - year) + (reference_month - month)) / 12)
    return(duration)

elasped_years(date)
# Out[46]: 
# 0    416.0
# 1     61.0
# dtype: float64