计算python中行之间的时差

时间:2019-06-30 20:33:07

标签: python dataframe datetime date-arithmetic

如果有多个条件,尝试计算行之间的时间差:

df['open_time'] = pd.to_datetime(df['open_time'], errors='coerce')
df['Time_diff'] = pd.to_datetime(df['Time_diff'], errors='coerce')

for i in range(1, len(df)):
if df.loc[i, 'JOB_ID'] == df.loc[i-1, 'JOB_ID'] and df.loc[i, 'STATION_IDX'] > df.loc[i-1, 'STATION_IDX']:
    df['Time_diff'] = df.loc[i, 'open_time'] - df.loc[i-1, 'open_time']

open_time是HH:mm:ss一天中执行操作的简单时间,仅此而已...

原始数据集是:

JOB_ID  DDMMYY  STATION_IDX open_time
121663240   04-02-19    25  5:02:19
121663240   04-02-19    26  5:04:00
121663240   04-02-19    27  5:04:42
121651974   04-02-19    25  6:08:15
121651974   04-02-19    27  6:10:28

我不明白为什么我对Time_diff的所有行都不断获取“ NaT”

       JOB_ID Time_diff
0   121663240       NaT
1   121663240       NaT
2   121663240       NaT
3   121651974       NaT
4   121651974       NaT
5   121682840       NaT
6   121682840       NaT

我似乎在Google中找不到适合我的计算行的答案。

我希望获得上述数据集的预期结果是:

JOB_ID ddmmyy   25 to 26    26 to 27    25 to 27
121663240   04-02-2019  101 42  143
121651974   04-02-2019  NaN NaN 133

1 个答案:

答案 0 :(得分:0)

因此,如果Time_diff的这两行的值相同,并且STATION_IDX增加,那么您只希望新列(open_time)包含连续行之间的列JOB_ID的差异。

在熊猫中,应该尝试避免显式循环,因为这不是优化熊猫的方式。这里您可能想要的是:

# first build a mask for rows having same JOB_ID and STATION_IDX as previous one
mask = (df.JOB_ID==df.JOB_ID.shift())&(df.STATION_IDX>df.STATION_IDX.shift())
# then compute the difference
df.loc[mask,'Time_diff'] = df.loc[mask, 'open_time'] - df.shift().loc[mask, 'open_time']

使用您的样本数据,可以得出:

      JOB_ID    DDMMYY  STATION_IDX           open_time Time_diff
0  121663240  04-02-19           25 2019-07-01 05:02:19       NaT
1  121663240  04-02-19           26 2019-07-01 05:04:00  00:01:41
2  121663240  04-02-19           27 2019-07-01 05:04:42  00:00:42
3  121651974  04-02-19           25 2019-07-01 06:08:15       NaT
4  121651974  04-02-19           27 2019-07-01 06:10:28  00:02:13