我有一个数据框,其中包含一些行程数据,其中每一行代表每个点/位置的数据。
trip_id, sequence, location, start_time
101, 1, point_a, 2020-05-01 00:00:01
101, 2, point_b, 2020-05-01 00:04:01
101, 3, point_c, 2020-05-01 00:14:01
102, 1, point_x, 2020-05-11 00:13:21
102, 2, point_y, 2020-05-11 00:14:01
103, 1, point_z, 2020-05-11 00:14:01
103, 3, point_za, 2020-05-11 00:20:01
我正在尝试创建一个新数据框,该数据框的数据位于同一行中两个连续点/位置之间,如下所示:
trip_id, sequence, start_location, start_time, sequence, end_location, end_time
101, 1, point_a, 2020-05-01 00:00:01, 2, point_b, 2020-05-01 00:04:01
101, 2, point_b, 2020-05-01 00:04:01, 3, point_c, 2020-05-01 00:14:01
102, 1, point_x, 2020-05-11 00:13:21, 2, point_y, 2020-05-11 00:14:01
103, 1, point_z, 2020-05-11 00:14:01, 3, point_za, 2020-05-11 00:20:01
答案 0 :(得分:1)
您可以删除顶部/底部的行并合并:
bottoms = df[df.trip_id.duplicated()].reset_index(drop=True)
tops = df[df.trip_id.duplicated(keep='last')].reset_index(drop=True)
# rename bottoms' columns
bottoms.columns = ['trip_id', 'sequence', 'end_location', 'end_time']
pd.concat((tops,bottoms), axis=1)