计算两个整数列之间的年龄

时间:2021-02-16 00:03:48

标签: pandas dataframe numpy datetime

我有以下 df:

OnlineDate    BDate
20190813      19720116
20190809      19570912
20190807      19600601
20190801      19760919
20190816      19530916

这两列是整数,是日期YYYYMMDD

我正在尝试获取一个新列,该列是这两个日期之间年数的结果。

所以,预期的输出是下一个

OnlineDate    BDate       NewColumn
20190813      19720116       47
20190809      19570912       61
20190807      19600601       59
20190801      19760919       51
20190816      19530916       66

我不能只减去年数,因为天数和月数可以确定年份,

我必须创建一个函数来完成它还是我可以不用一个函数来完成?

2 个答案:

答案 0 :(得分:1)

它需要一些设置,但您希望将列转换为日期时间,从中获取年份,然后简单地减去它们以获得差异

import pandas as pd
import numpy as np

# setup
onlinedate = [20190813, 20190809, 20190807, 20190801, 20190816]
bdate = [19720116, 19570912, 19600601, 19760919, 19530916]

df = pd.DataFrame({"onlinedate":onlinedate, "bdate":bdate})

# convert to dates
onlinedate_year = pd.to_datetime(df["onlinedate"], format="%Y%M%d")
bdate_year = pd.to_datetime(df["bdate"], format="%Y%M%d")
# Setup new column, columnwise operation
# Subtract the two dates and divide by years
df["NewColumn"] = ((onlinedate_year - bdate_year)/np.timedelta64(1,'Y'))
# convert the float column in to int
df["NewColumn"] = df["NewColumn"].astype(int)


print(df)

输出:

   onlinedate     bdate  NewColumn
0    20190813  19720116         46
1    20190809  19570912         61
2    20190807  19600601         59
3    20190801  19760919         42
4    20190816  19530916         65

答案 1 :(得分:1)

将数据类型转换为日期时间;

for col in ['OnlineDate','BDate']:
    df[col]=pd.to_datetime(df[col],format="%Y%m%d")

减去年份;

df['NewColumn']=df['OnlineDate'].dt.year-df['BDate'].dt.year