我有以下 df:
OnlineDate BDate
20190813 19720116
20190809 19570912
20190807 19600601
20190801 19760919
20190816 19530916
这两列是整数,是日期YYYYMMDD
我正在尝试获取一个新列,该列是这两个日期之间年数的结果。
所以,预期的输出是下一个
OnlineDate BDate NewColumn
20190813 19720116 47
20190809 19570912 61
20190807 19600601 59
20190801 19760919 51
20190816 19530916 66
我不能只减去年数,因为天数和月数可以确定年份,
我必须创建一个函数来完成它还是我可以不用一个函数来完成?
答案 0 :(得分:1)
它需要一些设置,但您希望将列转换为日期时间,从中获取年份,然后简单地减去它们以获得差异
import pandas as pd
import numpy as np
# setup
onlinedate = [20190813, 20190809, 20190807, 20190801, 20190816]
bdate = [19720116, 19570912, 19600601, 19760919, 19530916]
df = pd.DataFrame({"onlinedate":onlinedate, "bdate":bdate})
# convert to dates
onlinedate_year = pd.to_datetime(df["onlinedate"], format="%Y%M%d")
bdate_year = pd.to_datetime(df["bdate"], format="%Y%M%d")
# Setup new column, columnwise operation
# Subtract the two dates and divide by years
df["NewColumn"] = ((onlinedate_year - bdate_year)/np.timedelta64(1,'Y'))
# convert the float column in to int
df["NewColumn"] = df["NewColumn"].astype(int)
print(df)
输出:
onlinedate bdate NewColumn
0 20190813 19720116 46
1 20190809 19570912 61
2 20190807 19600601 59
3 20190801 19760919 42
4 20190816 19530916 65
答案 1 :(得分:1)
将数据类型转换为日期时间;
for col in ['OnlineDate','BDate']:
df[col]=pd.to_datetime(df[col],format="%Y%m%d")
减去年份;
df['NewColumn']=df['OnlineDate'].dt.year-df['BDate'].dt.year