我有N
个数据框,其中两列保存经度和纬度数据,跟踪汽车的运动。对于所有数据帧,汽车的一般跟踪是相同的,但由于跟踪有时会开始稍晚或稍早结束,因此数据帧的长度不同。
我希望数据帧“排列”,即修剪对应于“非重叠位置数据”的行。我希望结果是N
数据帧的长度相等。所有数据帧的位置数据都是相同的。
三个任意数据帧如下所示:
time speed longitude latitude
t00 v00 19.70 48.67
t01 v01 19.71 48.65
t02 v02 19.72 48.64
t03 v03 19.73 48.64
t04 v04 19.74 48.63
t05 v05 19.74 48.63
t06 v06 19.75 48.64
t07 v07 19.75 48.64
t08 v08 19.75 48.64
t09 v09 19.75 48.64
time speed longitude latitude
t10 v10 19.72 48.64
t11 v11 19.73 48.64
t12 v12 19.74 48.63
t13 v13 19.74 48.63
t14 v14 19.75 48.64
t15 v15 19.75 48.64
t16 v16 19.75 48.64
time speed longitude latitude
t20 v20 19.72 48.64
t21 v21 19.73 48.64
t22 v22 19.74 48.63
t23 v23 19.74 48.63
t24 v24 19.75 48.64
t25 v25 19.75 48.63
t26 v26 19.75 48.64
t27 v27 19.75 48.64
t28 v28 19.75 48.64
结果应该是三个新的数据框:
time speed longitude latitude
t02 v02 19.72 48.64
t03 v03 19.73 48.64
t04 v04 19.74 48.63
t05 v05 19.74 48.63
t06 v06 19.75 48.64
time speed longitude latitude
t10 v10 19.72 48.64
t11 v11 19.73 48.64
t12 v12 19.74 48.63
t13 v13 19.74 48.63
t14 v14 19.75 48.64
time speed longitude latitude
t20 v20 19.72 48.64
t21 v21 19.73 48.64
t22 v22 19.74 48.63
t23 v23 19.74 48.63
t24 v24 19.75 48.64
实际上,重叠坐标的数量会更高,但我希望这显示出它的要点。
我找到了this post,其中检索了两个列表之间的交集。我试图从数据框中提取位置数据,然后仅从所有数据框中提取具有匹配坐标的行,但由于数据帧之间的行数不同,这会失败。
我目前的代码如下所示:
first_route = True
for route in routes: # extract all route's coordinates
lon = route["longitude"].values.tolist()
lat = route["latitude"].values.tolist()
if first_route: # add first route regardless
cropped_lon = lon
cropped_lat = lat
first_route = False
continue
old_lon = collections.Counter(cropped_lon)
old_lat = collections.Counter(cropped_lat)
new_lon = collections.Counter(lon)
new_lat = collections.Counter(lat)
cropped_lon = list((old_lon & new_lon).elements())
cropped_lat = list((old_lat & new_lat).elements())
cropped_lon = np.asarray(cropped_lon)
cropped_lat = np.asarray(cropped_lat)
# THIS fails due to length difference
# Here I want to extract all rows which satisfy the positional restrictions
for route in routes:
print(route[route.longitude == cropped_lon and route.latitude == cropped_lat])
如果有人有更好的想法,我完全愿意抛弃我的全部想法。
接受的答案解决了标题中的问题,但我正在寻找一个扩展的解决方案。我希望它能以类似的方式实现,为什么我把它留作更新。
我的实际坐标数据具有更高的分辨率(6位小数)但测量结果不够准确。结果是接受的答案中的代码产生空数据帧。我可以使用最短的数据帧,然后“滑动”所有其他数据帧,以便进行最小二乘拟合,但我希望有一个更类似于下面的解决方案。
答案 0 :(得分:1)
您可以合并所有数据框以仅保留重叠部分。 让我们从您的示例数据开始:
cols = ['time','speed']
group_cols = ['longitude','latitude']
input_list = [[['t00','v00',19.70,48.67],
['t01','v01',19.71,48.65],
['t02','v02',19.72,48.64],
['t03','v03',19.73,48.64],
['t04','v04',19.74,48.63],
['t05','v05',19.74,48.63],
['t06','v06',19.75,48.64],
['t07','v07',19.75,48.64],
['t08','v08',19.75,48.64],
['t09','v09',19.75,48.64]],
[['t10','v10',19.72,48.64],
['t11','v11',19.73,48.64],
['t12','v12',19.74,48.63],
['t13','v13',19.74,48.63],
['t14','v14',19.75,48.64],
['t15','v15',19.75,48.64],
['t16','v16',19.75,48.64]],
[['t20','v20',19.72,48.64],
['t21','v21',19.73,48.64],
['t22','v22',19.74,48.63],
['t23','v23',19.74,48.63],
['t24','v24',19.75,48.64],
['t25','v25',19.75,48.63],
['t26','v26',19.75,48.64],
['t27','v27',19.75,48.64],
['t28','v28',19.75,48.64]]]
import pandas as pd
df_list = [pd.DataFrame(l, columns=[c + str(i) for c in cols] + group_cols) for i, l in enumerate(input_list)]
现在合并它们:
from functools import reduce
df = reduce(
lambda x, y: pd.merge(x, y, on=group_cols, how='inner'),
df_list)
+-----+--------+---------+------------+-----------+--------+---------+--------+--------+
| | time0 | speed0 | longitude | latitude | time1 | speed1 | time2 | speed2 |
+-----+--------+---------+------------+-----------+--------+---------+--------+--------+
| 0 | t02 | v02 | 19.72 | 48.64 | t10 | v10 | t20 | v20 |
| 1 | t03 | v03 | 19.73 | 48.64 | t11 | v11 | t21 | v21 |
| 2 | t04 | v04 | 19.74 | 48.63 | t12 | v12 | t22 | v22 |
| 3 | t04 | v04 | 19.74 | 48.63 | t12 | v12 | t23 | v23 |
| 4 | t04 | v04 | 19.74 | 48.63 | t13 | v13 | t22 | v22 |
| 5 | t04 | v04 | 19.74 | 48.63 | t13 | v13 | t23 | v23 |
| 6 | t05 | v05 | 19.74 | 48.63 | t12 | v12 | t22 | v22 |
| 7 | t05 | v05 | 19.74 | 48.63 | t12 | v12 | t23 | v23 |
| 8 | t05 | v05 | 19.74 | 48.63 | t13 | v13 | t22 | v22 |
| 9 | t05 | v05 | 19.74 | 48.63 | t13 | v13 | t23 | v23 |
| 10 | t06 | v06 | 19.75 | 48.64 | t14 | v14 | t24 | v24 |
| 11 | t06 | v06 | 19.75 | 48.64 | t14 | v14 | t26 | v26 |
| 12 | t06 | v06 | 19.75 | 48.64 | t14 | v14 | t27 | v27 |
| 13 | t06 | v06 | 19.75 | 48.64 | t14 | v14 | t28 | v28 |
| 14 | t06 | v06 | 19.75 | 48.64 | t15 | v15 | t24 | v24 |
| 15 | t06 | v06 | 19.75 | 48.64 | t15 | v15 | t26 | v26 |
| 16 | t06 | v06 | 19.75 | 48.64 | t15 | v15 | t27 | v27 |
| 17 | t06 | v06 | 19.75 | 48.64 | t15 | v15 | t28 | v28 |
| 18 | t06 | v06 | 19.75 | 48.64 | t16 | v16 | t24 | v24 |
| 19 | t06 | v06 | 19.75 | 48.64 | t16 | v16 | t26 | v26 |
| 20 | t06 | v06 | 19.75 | 48.64 | t16 | v16 | t27 | v27 |
| 21 | t06 | v06 | 19.75 | 48.64 | t16 | v16 | t28 | v28 |
| 22 | t07 | v07 | 19.75 | 48.64 | t14 | v14 | t24 | v24 |
| 23 | t07 | v07 | 19.75 | 48.64 | t14 | v14 | t26 | v26 |
| 24 | t07 | v07 | 19.75 | 48.64 | t14 | v14 | t27 | v27 |
| 25 | t07 | v07 | 19.75 | 48.64 | t14 | v14 | t28 | v28 |
| 26 | t07 | v07 | 19.75 | 48.64 | t15 | v15 | t24 | v24 |
| 27 | t07 | v07 | 19.75 | 48.64 | t15 | v15 | t26 | v26 |
| 28 | t07 | v07 | 19.75 | 48.64 | t15 | v15 | t27 | v27 |
| 29 | t07 | v07 | 19.75 | 48.64 | t15 | v15 | t28 | v28 |
| 30 | t07 | v07 | 19.75 | 48.64 | t16 | v16 | t24 | v24 |
| 31 | t07 | v07 | 19.75 | 48.64 | t16 | v16 | t26 | v26 |
| 32 | t07 | v07 | 19.75 | 48.64 | t16 | v16 | t27 | v27 |
| 33 | t07 | v07 | 19.75 | 48.64 | t16 | v16 | t28 | v28 |
| 34 | t08 | v08 | 19.75 | 48.64 | t14 | v14 | t24 | v24 |
| 35 | t08 | v08 | 19.75 | 48.64 | t14 | v14 | t26 | v26 |
| 36 | t08 | v08 | 19.75 | 48.64 | t14 | v14 | t27 | v27 |
| 37 | t08 | v08 | 19.75 | 48.64 | t14 | v14 | t28 | v28 |
| 38 | t08 | v08 | 19.75 | 48.64 | t15 | v15 | t24 | v24 |
| 39 | t08 | v08 | 19.75 | 48.64 | t15 | v15 | t26 | v26 |
| 40 | t08 | v08 | 19.75 | 48.64 | t15 | v15 | t27 | v27 |
| 41 | t08 | v08 | 19.75 | 48.64 | t15 | v15 | t28 | v28 |
| 42 | t08 | v08 | 19.75 | 48.64 | t16 | v16 | t24 | v24 |
| 43 | t08 | v08 | 19.75 | 48.64 | t16 | v16 | t26 | v26 |
| 44 | t08 | v08 | 19.75 | 48.64 | t16 | v16 | t27 | v27 |
| 45 | t08 | v08 | 19.75 | 48.64 | t16 | v16 | t28 | v28 |
| 46 | t09 | v09 | 19.75 | 48.64 | t14 | v14 | t24 | v24 |
| 47 | t09 | v09 | 19.75 | 48.64 | t14 | v14 | t26 | v26 |
| 48 | t09 | v09 | 19.75 | 48.64 | t14 | v14 | t27 | v27 |
| 49 | t09 | v09 | 19.75 | 48.64 | t14 | v14 | t28 | v28 |
| 50 | t09 | v09 | 19.75 | 48.64 | t15 | v15 | t24 | v24 |
| 51 | t09 | v09 | 19.75 | 48.64 | t15 | v15 | t26 | v26 |
| 52 | t09 | v09 | 19.75 | 48.64 | t15 | v15 | t27 | v27 |
| 53 | t09 | v09 | 19.75 | 48.64 | t15 | v15 | t28 | v28 |
| 54 | t09 | v09 | 19.75 | 48.64 | t16 | v16 | t24 | v24 |
| 55 | t09 | v09 | 19.75 | 48.64 | t16 | v16 | t26 | v26 |
| 56 | t09 | v09 | 19.75 | 48.64 | t16 | v16 | t27 | v27 |
| 57 | t09 | v09 | 19.75 | 48.64 | t16 | v16 | t28 | v28 |
+-----+--------+---------+------------+-----------+--------+---------+--------+--------+
最后:
df_list_out = [
df[[c + str(i) for c in cols] + group_cols].drop_duplicates() for i in range(len(input_list))]