我有2套地理代码作为熊猫系列,我试图找到从集合B中的点获得集合A中的点的最小欧几里德距离的最快方法。 那就是:最接近40.748043&从第二组开始-73.992953,依此类推。 非常感谢任何建议/帮助。
Set A:
print(latitude1)
print(longitude1)
0 40.748043
1 42.361016
Name: latitude, dtype: float64
0 -73.992953
1 -71.020005
Name: longitude, dtype: float64
Set B:
print(latitude2)
print(longitude2)
0 42.50729
1 42.50779
2 25.56473
3 25.78953
4 25.33132
5 25.06570
6 25.59246
7 25.61955
8 25.33737
9 24.11028
Name: latitude, dtype: float64
0 1.53414
1 1.52109
2 55.55517
3 55.94320
4 56.34199
5 55.17128
6 56.26176
7 56.27291
8 55.41206
9 52.73056
Name: longitude, dtype: float64
答案 0 :(得分:3)
这是仅使用numpy.linalg.norm
的一种方法。
import pandas as pd, numpy as np
df1['coords1'] = list(zip(df1['latitude1'], df1['longitude1']))
df2['coords2'] = list(zip(df2['latitude2'], df2['longitude2']))
def calc_min(x):
amin = np.argmin([np.linalg.norm(np.array(x)-np.array(y)) for y in df2['coords2']])
return df2['coords2'].iloc[amin]
df1['closest'] = df1['coords1'].map(calc_min)
# latitude1 longitude1 coords1 closest
# 0 40.748043 -73.992953 (40.748043, -73.992953) (42.50779, 1.52109)
# 1 42.361016 -71.020005 (42.361016, -71.020005) (42.50779, 1.52109)
# 2 25.361016 54.000000 (25.361016, 54.0) (25.0657, 55.17128)
<强>设置强>
from io import StringIO
mystr1 = """latitude1|longitude1
40.748043|-73.992953
42.361016|-71.020005
25.361016|54.0000
"""
mystr2 = """latitude2|longitude2
42.50729|1.53414
42.50779|1.52109
25.56473|55.55517
25.78953|55.94320
25.33132|56.34199
25.06570|55.17128
25.59246|56.26176
25.61955|56.27291
25.33737|55.41206
24.11028|52.73056"""
df1 = pd.read_csv(StringIO(mystr1), sep='|')
df2 = pd.read_csv(StringIO(mystr2), sep='|')
如果性能存在问题,您可以通过底层的numpy数组轻松地对此计算进行矢量化。
答案 1 :(得分:2)
您可以尝试使用geopy库。
https://pypi.python.org/pypi/geopy
以下是文档中的示例。
>>> from geopy.distance import vincenty
>>> newport_ri = (41.49008, -71.312796)
>>> cleveland_oh = (41.499498, -81.695391)
>>> print(vincenty(newport_ri, cleveland_oh).miles)
538.3904451566326
vincenty是vincenty距离
答案 2 :(得分:1)
对于那些最近点计算,通常有效的方法是使用基于kd树的快速最近邻居查找之一。使用Cython-powered implementation
,我们会有一种方法 -
from scipy.spatial import cKDTree
def closest_pts(setA_lat, setA_lng, setB_lat, setB_lng):
a_x = setA_lat.values
a_y = setA_lng.values
b_x = setB_lat.values
b_y = setB_lng.values
a = np.c_[a_x, a_y]
b = np.c_[b_x, b_y]
indx = cKDTree(b).query(a,k=1)[1]
return pd.Series(b_x[indx]), pd.Series(b_y[indx])
示例运行 -
1)输入:
In [106]: setA_lat
Out[106]:
0 40.748043
1 42.361016
dtype: float64
In [107]: setA_lng
Out[107]:
0 -73.992953
1 -71.020005
dtype: float64
In [108]: setB_lat
Out[108]:
0 42.460000
1 0.645894
2 0.437587
3 40.460000
4 0.963663
dtype: float64
In [109]: setB_lng
Out[109]:
0 -71.000000
1 0.925597
2 0.071036
3 -72.000000
4 0.020218
dtype: float64
2)输出:
In [110]: c_x,c_y = closest_pts(setA_lat, setA_lng, setB_lat, setB_lng)
In [111]: c_x
Out[111]:
0 40.46
1 42.46
dtype: float64
In [112]: c_y
Out[112]:
0 -72.0
1 -71.0
dtype: float64