假设下面的数据框df
Start_Date End_Date
0 20201101 20201130
1 20201201 20201231
2 20210101 20210131
3 20210201 20210228
4 20210301 20210331
如何计算以天为单位的两个日期列之间的时差?
必需的输出
Start_Date End_Date Diff_in_Days
0 20201101 20201130
1 20201201 20201231
2 20210101 20210131
3 20210201 20210228
4 20210301 20210331
答案 0 :(得分:3)
第一个想法是通过the JavaScript docs将列转换为日期时间,获得差值并将timedeltas转换为天:
df['Diff_in_Days'] = (pd.to_datetime(df['End_Date'], format='%Y%m%d')
.sub(pd.to_datetime(df['Start_Date'], format='%Y%m%d'))
.dt.days)
print (df)
Start_Date End_Date Diff_in_Days
0 20201101 20201130 29
1 20201201 20201231 30
2 20210101 20210131 30
3 20210201 20210228 27
4 20210301 20210331 30
如果稍后处理日期时间,另一个更好的解决方案是重新分配列并使用上面的解决方案:
df['Start_Date'] = pd.to_datetime(df['Start_Date'], format='%Y%m%d')
df['End_Date'] = pd.to_datetime(df['End_Date'], format='%Y%m%d')
df['Diff_in_Days'] = df['End_Date'].sub(df['Start_Date']).dt.days
print (df)
Start_Date End_Date Diff_in_Days
0 2020-11-01 2020-11-30 29
1 2020-12-01 2020-12-31 30
2 2021-01-01 2021-01-31 30
3 2021-02-01 2021-02-28 27
4 2021-03-01 2021-03-31 30