我在熊猫中有以下数据框
start_date start_time end_time
2018-01-01 23:55:00 00:05:00
2018-01-02 00:05:00 00:10:00
2018-01-03 23:59:00 00:05:00
我想计算时间差。但是,对于第1次和第3次观察,end_time
中有一个日期更改。
如何在熊猫中做到这一点?
当前,我正在使用end_time
小于start_time
的逻辑,我要创建另外一个称为end_date
的列,在其中它将start_date
递增1,然后减去时间。
还有其他方法吗?
答案 0 :(得分:2)
使用时间增量的解决方案-如果差异days
等于-1
,则加一天:
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'])
d = df['end_time'] - df['start_time']
df['diff'] = d.mask(d.dt.days == -1, d + pd.Timedelta(1, unit='d'))
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
另一种解决方案:
s = df['end_time'] - df['start_time']
df['diff'] = np.where(df['end_time'] < df['start_time'],
s + pd.Timedelta(1, unit='d'),
s)
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00