我想计算纬度-经度之间的地理距离。
我已经检查了该线程Vectorizing Haversine distance calculation in Python 但是当我将其用于两个不同的坐标集时,会出现错误。
df1的大小可以达到数百万,如果还有其他方法可以在更短的时间内计算出准确的地理距离,那将非常有帮助。
length1 = 1000
d1 = np.random.uniform(-90, 90, length1)
d2 = np.random.uniform(-180, 180, length1)
length2 = 100
d3 = np.random.uniform(-90, 90, length2)
d4 = np.random.uniform(-180, 180, length2)
coords = tuple(zip(d1, d2))
df1 = pd.DataFrame({'coordinates':coords})
coords = tuple(zip(d3, d4))
df2 = pd.DataFrame({'coordinates':coords})
def get_diff(df1, df2):
data1 = np.array(df1['coordinates'].tolist())
data2 = np.array(df2['coordinates'].tolist())
lat1 = data1[:,0]
lng1 = data1[:,1]
lat2 = data2[:,0]
lng2 = data2[:,1]
#print(lat1.shape)
#print(lng1.shape)
#print(lat2.shape)
#print(lng2.shape)
diff_lat = lat1[:,None] - lat2
diff_lng = lng1[:,None] - lng2
#print(diff_lat.shape)
#print(diff_lng.shape)
d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat1) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
get_diff(df1, df2)
ValueError Traceback (most recent call last)
<ipython-input-58-df06c7cff72c> in <module>
----> 1 get_diff(df1, df2)
<ipython-input-57-9bd8f10189e6> in get_diff(df1, df2)
26 print(diff_lat.shape)
27 print(diff_lng.shape)
---> 28 d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat1) * np.sin(diff_lng/2)**2
29 return 2 * 6371 * np.arcsin(np.sqrt(d))
ValueError: operands could not be broadcast together with shapes (1000,1000) (1000,100)
答案 0 :(得分:2)
这是基于this post
的broadcasting
的矢量化方法-
def convert_to_arrays(df1, df2):
d1 = np.array(df1['coordinates'].tolist())
d2 = np.array(df2['coordinates'].tolist())
return d1,d2
def broadcasting_based_lng_lat(data1, data2):
# data1, data2 are the data arrays with 2 cols and they hold
# lat., lng. values in those cols respectively
data1 = np.deg2rad(data1)
data2 = np.deg2rad(data2)
lat1 = data1[:,0]
lng1 = data1[:,1]
lat2 = data2[:,0]
lng2 = data2[:,1]
diff_lat = lat1[:,None] - lat2
diff_lng = lng1[:,None] - lng2
d = np.sin(diff_lat/2)**2 + np.cos(lat1[:,None])*np.cos(lat2) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
因此,要解决您的问题以获取所有成对的Haversine距离,应该是-
broadcasting_based_lng_lat(*convert_to_arrays(df1,df2))
对于两个数据之间按元素进行的haversine距离计算,以使每个数据分别在两列或每个两个元素的列表中包含经度和纬度,我们将跳过对2D
的某些扩展,最后得到像这样的东西-
def broadcasting_based_lng_lat_elementwise(data1, data2):
# data1, data2 are the data arrays with 2 cols and they hold
# lat., lng. values in those cols respectively
data1 = np.deg2rad(data1)
data2 = np.deg2rad(data2)
lat1 = data1[:,0]
lng1 = data1[:,1]
lat2 = data2[:,0]
lng2 = data2[:,1]
diff_lat = lat1 - lat2
diff_lng = lng1 - lng2
d = np.sin(diff_lat/2)**2 + np.cos(lat1)*np.cos(lat2) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
使用一个数据帧运行示例,该数据帧将两列中的两个数据保存在一起-
In [42]: np.random.seed(0)
...: a = np.random.randint(10,100,(5,2)).tolist()
...: b = np.random.randint(10,100,(5,2)).tolist()
...: df = pd.DataFrame({'A':a,'B':b})
In [43]: df
Out[43]:
A B
0 [54, 57] [80, 98]
1 [74, 77] [98, 22]
2 [77, 19] [68, 75]
3 [93, 31] [49, 97]
4 [46, 97] [56, 98]
In [44]: from haversine import haversine
In [45]: [haversine(i,j) for (i,j) in zip(df.A,df.B)]
Out[45]:
[3235.9659882513424,
2399.6124657290075,
2012.0851666001824,
4702.8069773315865,
1114.1193334220534]
In [46]: broadcasting_based_lng_lat_elementwise(np.vstack(df.A), np.vstack(df.B))
Out[46]:
array([3235.96151855, 2399.60915125, 2012.08238739, 4702.80048155,
1114.11779454])
这些细微的差异主要是因为haversine
library假设6371.0088
为地球半径,而在这里我们将其取为6371
。
答案 1 :(得分:0)
使用简单的print
语句显示方程式的参数。 sin
表达式中的某些操作长度是不同的-基础broadcast
操作(zip
的向量等效项)需要相等的长度。