对数据框列的时间操作

时间:2018-04-18 10:58:16

标签: python pandas datetime dataframe

对于此数据框,我正在尝试列 ID 的每个值id2计算ID值为id1的上一行和后一行之间的时间差。然后按时间保留最接近的行。

  Id         time              value

  id1        14:07:53.158      1
  id2        14:07:53.358      2
  id1        14:07:54.462      3
  id1        14:10:09.560      4
  id2        14:10:10.160      5
  id1        14:10:10.520      6

2 个答案:

答案 0 :(得分:1)

这是一种方式。

# convert time column to timedelta
df['time'] = pd.to_timedelta(df['time'])

# create dictionary of results, with keys as df index
d = {i+1: df['time'].iloc[i+2] - df['time'].iloc[i] for i in range(0, len(df.index), 3)}

# map differences to dataframe
df['difference'] = df.index.map(d.get)

# filter for lowest time
res = df[df['difference'] == df['difference'].min()]

print(res)

#     Id            time  value      difference
# 4  id2 14:10:10.160000      5 00:00:00.960000

答案 1 :(得分:1)

首先,构建上一次和下次的增量:

df['prev'] = (df.time.shift(-1) - df.time)[::3]

df['next'] = (df.time - df.time.shift(1))[2::3]
df['next'] = (df.time - df.time.shift(1))[2::3]

df
Id  time    value   prev    next
0   id1 2018-04-18 14:07:53.158 1   00:00:00.200000 NaT
1   id2 2018-04-18 14:07:53.358 2   NaT NaT
2   id1 2018-04-18 14:07:54.462 3   NaT 00:00:01.104000
3   id1 2018-04-18 14:10:09.560 4   00:00:00.600000 NaT
4   id2 2018-04-18 14:10:10.160 5   NaT NaT
5   id1 2018-04-18 14:10:10.520 6   NaT 00:00:00.360000

然后填充NA并计算最小时间值:

df.prev = df.prev.ffill()
df.next = df.next.bfill()

df['keep'] = df.prev < df.next

df
df
Id  time    value   prev    next    keep
0   id1 2018-04-18 14:07:53.158 1   00:00:00.200000 00:00:01.104000 True
1   id2 2018-04-18 14:07:53.358 2   00:00:00.200000 00:00:01.104000 True
2   id1 2018-04-18 14:07:54.462 3   00:00:00.200000 00:00:01.104000 True
3   id1 2018-04-18 14:10:09.560 4   00:00:00.600000 00:00:00.360000 False
4   id2 2018-04-18 14:10:10.160 5   00:00:00.600000 00:00:00.360000 False
5   id1 2018-04-18 14:10:10.520 6   00:00:00.600000 00:00:00.360000 False

现在根据以下标准过滤结果:保持id2行,以及每行modulo 3 = 0,其中keep为True,或模3 = 2其中keep为False:

df[((df.Id=='id2') | ((df.index%3==0) & df.keep) | ((df.index%3==2) & ~df.keep))]

Id  time    value   prev    next    keep
0   id1 2018-04-18 14:07:53.158 1   00:00:00.200000 00:00:01.104000 True
1   id2 2018-04-18 14:07:53.358 2   00:00:00.200000 00:00:01.104000 True
4   id2 2018-04-18 14:10:10.160 5   00:00:00.600000 00:00:00.360000 False
5   id1 2018-04-18 14:10:10.520 6   00:00:00.600000 00:00:00.360000 False