我有两个lat和long数组。我想计算每对lat和long之间的距离,以及阵列中每隔一对lat和long。 这是我的两个数组。
lat_array
array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132,
0.33370132, 0.33370132, 0.33371075, 0.33371075, 0.33370132,
0.33370132, 0.33370132, 0.33356488, 0.33356488, 0.33370132,
0.33370132, 0.33370132, 0.33401788, 0.33362632, 0.33362632,
0.33364007, 0.33370132, 0.33401788, 0.33401788, 0.33358399,
0.33358399, 0.33358399, 0.33370132, 0.33370132, 0.33362632,
0.33370132, 0.33370132, 0.33370132, 0.33370132, 0.33370132,
0.33356488, 0.33356456, 0.33391071, 0.33370132, 0.33356488,
0.33356488, 0.33356456, 0.33356456, 0.33356456, 0.33362632,
0.33364804, 0.3336314 , 0.33370132, 0.33370132, 0.33370132,
0.33364034, 0.33359921, 0.33370132, 0.33360397, 0.33348863,
0.33370132])
long_array
array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 ,
1.2724337 , 1.2724337 , 1.27246931, 1.27246931, 1.2724337 ,
1.2724337 , 1.2724337 , 1.27254305, 1.27254305, 1.2724337 ,
1.2724337 , 1.2724337 , 1.27259085, 1.27250461, 1.27250461,
1.27251211, 1.2724337 , 1.27259085, 1.27259085, 1.27252134,
1.27252134, 1.27252134, 1.2724337 , 1.2724337 , 1.27250461,
1.2724337 , 1.2724337 , 1.2724337 , 1.2724337 , 1.2724337 ,
1.27254305, 1.27253229, 1.27266808, 1.2724337 , 1.27254305,
1.27254305, 1.27253229, 1.27253229, 1.27253229, 1.27250461,
1.27250534, 1.27250184, 1.2724337 , 1.2724337 , 1.2724337 ,
1.27251339, 1.27223739, 1.2724337 , 1.2722575 , 1.27237575,
1.2724337 ])
转换成弧度后。现在我想要第一对lat和long之间的距离以及剩余的lat和long对等等。并希望打印对和相应的距离。
这就是我在python中所做的。
distance = []
R = 6371.0
for i in range(len(lat_array)):
for j in (i+1,len(lat_array)):
dlon = long_array[j]-long_array[i]
dlat = lat_array[j]-lat_array[i]
a = sin(dlat / 2)**2 + cos(lat_array[i]) * cos(lat_array[j]) *
sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance.append(R * c)
它给我一个错误IndexError: index 56 is out of bounds for axis 0 with size 56
我做错了什么?如果阵列很大,如何使计算更快?请帮忙。
答案 0 :(得分:6)
由于这是目前谷歌“双向距离”的最佳结果,我将加上我的两分钱:如果您有权访问scikit-learn
,这个问题可以很快解决。在查看sklearn.metrics.
pairwise_distances
时,您会注意到不支持'hasrsine'指标,但会在sklearn.neighbors.
DistanceMetric
中实施。
这意味着您可以执行以下操作:
from sklearn.neighbors import DistanceMetric
def sklearn_haversine(lat, lon):
haversine = DistanceMetric.get_metric('haversine')
latlon = np.hstack((lat[:, np.newaxis], lon[:, np.newaxis]))
dists = haversine.pairwise(latlon)
return 6371 * dists
请注意,lat
和lon
的串联只是必要的,因为它们是单独的数组。如果您将它们作为形状(n_samples, 2)
的组合数组传递,则可以直接在它们上面调用haversine.pairwise
。此外,如果您需要以千米为单位的距离,则仅需要乘以6371
。例如。如果你想简单地找到最接近的一对点,那么这一步就没有必要了。
验证
In [87]: lat = np.array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132])
In [88]: lng = np.array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 ])
In [89]: sklearn_haversine(lat, lng)
Out[89]:
array([[ 0. , 0.25227021, 0.25227021, 2.90953323, 1.05422047],
[ 0.25227021, 0. , 0. , 3.00383463, 0.98975923],
[ 0.25227021, 0. , 0. , 3.00383463, 0.98975923],
[ 2.90953323, 3.00383463, 3.00383463, 0. , 2.2276139 ],
[ 1.05422047, 0.98975923, 0.98975923, 2.2276139 , 0. ]])
性能:
In [91]: lat = np.random.randn(1000)
In [92]: lng = np.random.randn(1000)
In [93]: %timeit original_app(lat,lng)
1 loops, best of 3: 1.46 s per loop
In [94]: %timeit vectorized_app1(lat,lng)
10 loops, best of 3: 86.7 ms per loop
In [95]: %timeit vectorized_app2(lat,lng)
10 loops, best of 3: 75.7 ms per loop
In [96]: %timeit sklearn_haversine(lat,lng)
10 loops, best of 3: 76 ms per loop
总之,您可以使用更短更简单的代码以vectorized_app1
的速度获取Divakar vectorized_app2
的输出。
答案 1 :(得分:5)
假设lat
和lng
为晶格&经度数组和那些有弧度数据的数组,这里是一个基于this other solution
的矢量化解决方案 -
# Elementwise differentiations for lattitudes & longitudes
dflat = lat[:,None] - lat
dflng = lng[:,None] - lng
# Finally Calculate haversine using its distance formula
d = np.sin(dflat/2)**2 + np.cos(lat[:,None])*np.cos(lat) * np.sin(dflng/2)**2
hav_dists = 2 * 6371 * np.arcsin(np.sqrt(d))
现在,上述方法将为所有对提供输出,而不管它们的顺序如何。因此,对于两对,我们将有两个距离输出:(point1,point2)
& (point2,point1)
,即使距离相同。因此,为了节省内存并希望获得更好的性能,您可以使用np.triu_indices
创建唯一的配对ID,并修改之前列出的方法,如下所示 -
# Elementwise differentiations for lattitudes & longitudes,
# but not repeat for the same paired elements
N = lat.size
idx1,idx2 = np.triu_indices(N,1)
dflat = lat[idx2] - lat[idx1]
dflng = lng[idx2] - lng[idx1]
# Finally Calculate haversine using its distance formula
d = np.sin(dflat/2)**2 + np.cos(lat[idx2])*np.cos(lat[idx1]) * np.sin(dflng/2)**2
hav_dists = 2 * 6371 * np.arcsin(np.sqrt(d))
功能定义 -
def original_app(lat,lng):
distance = []
R = 6371.0
for i in range(len(lat)):
for j in range(i+1,len(lat)):
dlon = lng[j]-lng[i]
dlat = lat[j]-lat[i]
a = sin(dlat / 2)**2 + cos(lat[i]) * cos(lat[j]) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance.append(R * c)
return distance
def vectorized_app1(lat,lng):
dflat = lat[:,None] - lat
dflng = lng[:,None] - lng
d = np.sin(dflat/2)**2 + np.cos(lat[:,None])*np.cos(lat) * np.sin(dflng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
def vectorized_app2(lat,lng):
N = lat.size
idx1,idx2 = np.triu_indices(N,1)
dflat = lat[idx2] - lat[idx1]
dflng = lng[idx2] - lng[idx1]
d =np.sin(dflat/2)**2+np.cos(lat[idx2])*np.cos(lat[idx1])*np.sin(dflng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
验证输出 -
In [78]: lat
Out[78]: array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132])
In [79]: lng
Out[79]: array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 ])
In [80]: original_app(lat,lng)
Out[80]:
[0.2522702110418014,
0.2522702110418014,
2.909533226553249,
1.0542204712876762,
0.0,
3.003834632906676,
0.9897592295963831,
3.003834632906676,
0.9897592295963831,
2.2276138997714474]
In [81]: vectorized_app1(lat,lng)
Out[81]:
array([[ 0. , 0.25227021, 0.25227021, 2.90953323, 1.05422047],
[ 0.25227021, 0. , 0. , 3.00383463, 0.98975923],
[ 0.25227021, 0. , 0. , 3.00383463, 0.98975923],
[ 2.90953323, 3.00383463, 3.00383463, 0. , 2.2276139 ],
[ 1.05422047, 0.98975923, 0.98975923, 2.2276139 , 0. ]])
In [82]: vectorized_app2(lat,lng)
Out[82]:
array([ 0.25227021, 0.25227021, 2.90953323, 1.05422047, 0. ,
3.00383463, 0.98975923, 3.00383463, 0.98975923, 2.2276139 ])
运行时测试 -
In [83]: lat = np.random.randn(1000)
In [84]: lng = np.random.randn(1000)
In [85]: %timeit original_app(lat,lng)
1 loops, best of 3: 2.11 s per loop
In [86]: %timeit vectorized_app1(lat,lng)
1 loops, best of 3: 263 ms per loop
In [87]: %timeit vectorized_app2(lat,lng)
1 loops, best of 3: 224 ms per loop
因此,为了表现,似乎vectorized_app2
可能是要走的路!
答案 2 :(得分:4)
您的代码中有拼写错误。变化
for j in (i+1,len(lat_array)):
到
for j in range(i+1,len(lat_array)):
否则,您将迭代由两个元素i+1
和len(lat_array)
组成的元组。第二个导致错误。
答案 3 :(得分:0)
scikit-learn 0.21.0(于2019-05发行)中引入的haversine_distances函数可用于此目的。示例命令:
% ipython
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
import numpy as np
lat = np.array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132])
lon = np.array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 ])
position = np.column_stack((lat, lon))
position
Out[1]:
array([[0.33356456, 1.27253229],
[0.33355585, 1.27249141],
[0.33355585, 1.27249141],
[0.33401788, 1.27259085],
[0.33370132, 1.2724337 ]])
In [2]:
from sklearn.metrics.pairwise import haversine_distances
R = 6371.0
D1 = R * haversine_distances(position)
D1
Out[2]:
array([[0. , 0.25227021, 0.25227021, 2.90953323, 1.05422047],
[0.25227021, 0. , 0. , 3.00383463, 0.98975923],
[0.25227021, 0. , 0. , 3.00383463, 0.98975923],
[2.90953323, 3.00383463, 3.00383463, 0. , 2.2276139 ],
[1.05422047, 0.98975923, 0.98975923, 2.2276139 , 0. ]])
参考:-