第一个数据帧df1包含id及其对应的两个坐标。对于第一个数据帧中的每个坐标对,我必须遍历第二个数据帧以找到距离最小的数据帧。我尝试了单独的坐标并找到它们之间的距离,但它没有按预期工作。我相信在找到它们之间的距离时必须将其作为一对。不确定Python是否提供了一些实现此目的的方法。
例如:df1
Id Co1 Co2
334 30.371353 -95.384010
337 39.497448 -119.789623
DF2
Id Co1 Co2
339 40.914585 -73.892456
441 34.760395 -77.999260
dfloc3 =[[38.991512-77.441536],
[40.89869-72.37637],
[40.936115-72.31452],
[30.371353-95.38401],
[39.84819-75.37162],
[36.929306-76.20035],
[40.682342-73.979645]]
dfloc4 = [[40.914585,-73.892456],
[41.741543,-71.406334],
[50.154522,-96.88806],
[39.743565,-121.795761],
[30.027597,-89.91014],
[36.51881,-82.560844],
[30.449587,-84.23629],
[42.920475,-85.8208]]
答案 0 :(得分:1)
以下代码在df1
中创建了一个新列,显示df2
中最近点的ID。 (我不能从问题中判断出这是否是你想要的。)我假设坐标位于欧几里德空间,即点之间的距离由毕达哥拉斯定理给出。如果没有,您可以轻松使用其他计算而不是dist_squared
。
import pandas as pd
df1 = pd.DataFrame(dict(Id=[334, 337], Co1=[30.371353, 39.497448], Co2=[-95.384010, -119.789623]))
df2 = pd.DataFrame(dict(Id=[339, 441], Co1=[40.914585, 34.760395], Co2=[-73.892456, -77.999260]))
def nearest(row, df):
# calculate euclidian distance from given row to all rows of df
dist_squared = (row.Co1 - df.Co1) ** 2 + (row.Co2 - df.Co2) ** 2
# find the closest row of df
smallest_idx = dist_squared.argmin()
# return the Id for the closest row of df
return df.loc[smallest_idx, 'Id']
near = df1.apply(nearest, args=(df2,), axis=1)
df1['nearest'] = near
答案 1 :(得分:1)
鉴于你可以将你的积分变成这样的列表......
df1 = [[30.371353, -95.384010], [39.497448, -119.789623]]
df2 = [[40.914585, -73.892456], [34.760395, -77.999260]]
导入数学然后创建一个函数,以便更容易找到距离:
import math
def distance(pt1, pt2):
return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)
然后简单地横切你的列表,保存最近的点:
for pt1 in df1:
closestPoints = [pt1, df2[0]]
for pt2 in df2:
if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
closestPoints = [pt1, pt2]
print ("Point: " + str(closestPoints[0]) + " is closest to " + str(closestPoints[1]))
输出:
Point: [30.371353, -95.38401] is closest to [34.760395, -77.99926]
Point: [39.497448, -119.789623] is closest to [34.760395, -77.99926]