对于此数据框,我正在尝试列 ID 的每个值id2
计算ID值为id1
的上一行和后一行之间的时间差。然后按时间保留最接近的行。
Id time value
id1 14:07:53.158 1
id2 14:07:53.358 2
id1 14:07:54.462 3
id1 14:10:09.560 4
id2 14:10:10.160 5
id1 14:10:10.520 6
答案 0 :(得分:1)
这是一种方式。
# convert time column to timedelta
df['time'] = pd.to_timedelta(df['time'])
# create dictionary of results, with keys as df index
d = {i+1: df['time'].iloc[i+2] - df['time'].iloc[i] for i in range(0, len(df.index), 3)}
# map differences to dataframe
df['difference'] = df.index.map(d.get)
# filter for lowest time
res = df[df['difference'] == df['difference'].min()]
print(res)
# Id time value difference
# 4 id2 14:10:10.160000 5 00:00:00.960000
答案 1 :(得分:1)
首先,构建上一次和下次的增量:
df['prev'] = (df.time.shift(-1) - df.time)[::3]
df['next'] = (df.time - df.time.shift(1))[2::3]
df['next'] = (df.time - df.time.shift(1))[2::3]
df
Id time value prev next
0 id1 2018-04-18 14:07:53.158 1 00:00:00.200000 NaT
1 id2 2018-04-18 14:07:53.358 2 NaT NaT
2 id1 2018-04-18 14:07:54.462 3 NaT 00:00:01.104000
3 id1 2018-04-18 14:10:09.560 4 00:00:00.600000 NaT
4 id2 2018-04-18 14:10:10.160 5 NaT NaT
5 id1 2018-04-18 14:10:10.520 6 NaT 00:00:00.360000
然后填充NA并计算最小时间值:
df.prev = df.prev.ffill()
df.next = df.next.bfill()
df['keep'] = df.prev < df.next
df
df
Id time value prev next keep
0 id1 2018-04-18 14:07:53.158 1 00:00:00.200000 00:00:01.104000 True
1 id2 2018-04-18 14:07:53.358 2 00:00:00.200000 00:00:01.104000 True
2 id1 2018-04-18 14:07:54.462 3 00:00:00.200000 00:00:01.104000 True
3 id1 2018-04-18 14:10:09.560 4 00:00:00.600000 00:00:00.360000 False
4 id2 2018-04-18 14:10:10.160 5 00:00:00.600000 00:00:00.360000 False
5 id1 2018-04-18 14:10:10.520 6 00:00:00.600000 00:00:00.360000 False
现在根据以下标准过滤结果:保持id2行,以及每行modulo 3 = 0,其中keep为True,或模3 = 2其中keep为False:
df[((df.Id=='id2') | ((df.index%3==0) & df.keep) | ((df.index%3==2) & ~df.keep))]
Id time value prev next keep
0 id1 2018-04-18 14:07:53.158 1 00:00:00.200000 00:00:01.104000 True
1 id2 2018-04-18 14:07:53.358 2 00:00:00.200000 00:00:01.104000 True
4 id2 2018-04-18 14:10:10.160 5 00:00:00.600000 00:00:00.360000 False
5 id1 2018-04-18 14:10:10.520 6 00:00:00.600000 00:00:00.360000 False