我有两个数据帧:
print (df1)
ID Birthday
0 A000 1990-01-01
1 A001 1991-05-05
2 A002 1970-10-01
3 A003 1980-07-07
4 A004 1945-08-15
print (df2)
ID Date from
0 A000 2010.01
1 A001 2012.01
2 A002 2010.01
3 A002 2010.01
4 A002 2010.11
5 A003 2009.05
6 A003 2010.01
7 A004 2010.01
8 A005 2007.11
9 A006 2017.01
df1由ID组成,而生日和df2包含ID和日期。 df2.ID中的某些值不在df1.ID中(即A005和A006)。
我正在尝试:
如果df1.ID中存在df2.ID,我想计算df1.Birthday和df2.Date之间的差异。
到目前为止我做了什么:
df1['Birthday'] = pd.to_datetime(df1['Birthday'])
df2['Date from'] = pd.to_datetime(df2['Date from'])
x1 = df1.set_index(['ID'])['Birthday']
x2 = df2.set_index(['ID'])['Date from']
x3 = x2.sub(x1,fill_value=0)
print(x3)
ID
A000 -7305 days +00:00:00.000002
A001 -7794 days +00:00:00.000002
A002 -273 days +00:00:00.000002
A002 -273 days +00:00:00.000002
A002 -273 days +00:00:00.000002
A003 -3840 days +00:00:00.000002
A003 -3840 days +00:00:00.000002
A004 8905 days 00:00:00.000002
A005 0 days 00:00:00.000002
A006 0 days 00:00:00.000002
dtype: timedelta64[ns]
由于ID A003具有相同的值但由不同的日期组成,因此存在错误。我不确定如何继续下一步。提前感谢您提供的任何帮助。
答案 0 :(得分:1)
首先,我会合并数据框,以确保正确排队。然后在新列中减去两个日期列:
import pandas
from io import StringIO
data1 = StringIO("""\
ID Birthday
A000 1990-01-01
A001 1991-05-05
A002 1970-10-01
A003 1980-07-07
A004 1945-08-15
""")
data2 = StringIO("""\
ID Date_from
A000 2010.01
A001 2012.01
A002 2010.01
A002 2010.01
A002 2010.11
A003 2009.05
A003 2010.01
A004 2010.01
A005 2007.11
A006 2017.01
""")
x1 = pandas.read_table(data1, sep='\s+', parse_dates=['Birthday'])
x2 = pandas.read_table(data2, sep='\s+', parse_dates=['Date_from'])
data = (
x2.merge(right=x1, left_on='ID', right_on='ID', how='left')
.assign(Date_diff=lambda df: df['Date_from'] - df['Birthday'])
)
print(data)
这让我:
ID Date_from Birthday Date_diff
0 A000 2010-01-01 1990-01-01 7305 days
1 A001 2012-01-01 1991-05-05 7546 days
2 A002 2010-01-01 1970-10-01 14337 days
3 A002 2010-01-01 1970-10-01 14337 days
4 A002 2010-11-01 1970-10-01 14641 days
5 A003 2009-05-01 1980-07-07 10525 days
6 A003 2010-01-01 1980-07-07 10770 days
7 A004 2010-01-01 1945-08-15 23515 days
8 A005 2007-11-01 NaT NaT
9 A006 2017-01-01 NaT NaT
答案 1 :(得分:0)
使用dateutil包来获得年,月,日的差异:
from dateutil import relativedelta as rdelta
from datetime import date
d1 = date(2010,5,1)
d2 = date(2012,1,1)
rd = rdelta.relativedelta(d2,d1)