如何计算两个日期列之间的时差

时间:2019-12-04 11:30:24

标签: python pandas dataframe

假设下面的数据框df

   Start_Date   End_Date
0   20201101    20201130
1   20201201    20201231
2   20210101    20210131
3   20210201    20210228
4   20210301    20210331

如何计算以天为单位的两个日期列之间的时差?

必需的输出

   Start_Date   End_Date   Diff_in_Days
0   20201101    20201130
1   20201201    20201231
2   20210101    20210131
3   20210201    20210228
4   20210301    20210331

1 个答案:

答案 0 :(得分:3)

第一个想法是通过the JavaScript docs将列转换为日期时间,获得差值并将timedeltas转换为天:

df['Diff_in_Days'] = (pd.to_datetime(df['End_Date'], format='%Y%m%d')
                        .sub(pd.to_datetime(df['Start_Date'], format='%Y%m%d'))
                        .dt.days)
print (df)
   Start_Date  End_Date  Diff_in_Days
0    20201101  20201130            29
1    20201201  20201231            30
2    20210101  20210131            30
3    20210201  20210228            27
4    20210301  20210331            30

如果稍后处理日期时间,另一个更好的解决方案是重新分配列并使用上面的解决方案:

df['Start_Date'] = pd.to_datetime(df['Start_Date'], format='%Y%m%d')
df['End_Date'] = pd.to_datetime(df['End_Date'], format='%Y%m%d')

df['Diff_in_Days'] = df['End_Date'].sub(df['Start_Date']).dt.days
print (df)
  Start_Date   End_Date  Diff_in_Days
0 2020-11-01 2020-11-30            29
1 2020-12-01 2020-12-31            30
2 2021-01-01 2021-01-31            30
3 2021-02-01 2021-02-28            27
4 2021-03-01 2021-03-31            30