使用python计算两个日期之间的持续时间

时间:2017-11-18 15:26:26

标签: python spark-dataframe

我有数据集

          DaySchedule                  DayAppointment 
          2016-04-29 18:38:08           2016-04-29
          2016-04-29 16:08:27           2016-04-29
          2016-04-26 15:04:17           2016-04-29

我想计算在约会日期和约会日期之间的持续时间,如果他们在同一天,则持续时间为0,否则我将从约会日减去约会日。

     def duration_time(x,y):
         x= x.dt.date
         y= y.dt.date
        if x==y:
               return 0
       else:
           return x-y

     Patient["duration"] = Patient.apply(lambda Patient:duration_time(Patient["DayAppointment"], Patient["DaySchedule"]), axis=1)

我运行此鳕鱼后出现此错误: AttributeError :(“'Timestamp'对象没有属性'dt'”,你在索引0'发生了')

我知道为什么会收到此错误?

1 个答案:

答案 0 :(得分:0)

使用numpy where + dt.date + sub代替:

Patient.DaySchedule=pd.to_datetime(Patient.DaySchedule)
Patient.DayAppointment=pd.to_datetime(Patient.DayAppointment)

Patient['duration']=np.where(Patient.DaySchedule.dt.date==Patient.DayAppointment.dt.date, 0, Patient.DaySchedule.sub(Patient.DayAppointment))

    DaySchedule DayAppointment      Duration
    2016-04-29 18:38:08 2016-04-29  0 days 00:00:00
    2016-04-29 16:08:27 2016-04-29  0 days 00:00:00
    2016-04-26 15:04:17 2016-04-29  -3 days +15:04:17

您也可以获得好日子:

Patient['Duration']=Patient.DaySchedule.sub(Patient.DayAppointment).astype('timedelta64[D]')

DaySchedule DayAppointment      Duration
2016-04-29 18:38:08 2016-04-29  0.0
2016-04-29 16:08:27 2016-04-29  0.0
2016-04-26 15:04:17 2016-04-29  -3.0

使用sub,需要:

1.59 ms ± 63.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用简单的减法需要几乎几毫秒的时间:

Patient['Duration']=np.where(Patient.DaySchedule.dt.date==Patient.DayAppointment.dt.date, 0, Patient.DaySchedule-Patient.DayAppointment)

2.51 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)