我有一个具有600多个地理坐标点的数据框熊猫。他的摘录如下:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import sin, cos, sqrt, atan2, radians
lat_long = pd.DataFrame({'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})
lat_long
要手动计算两点之间的距离,请使用以下代码:
lat1 = radians(lat_long['LATITUDE'][0])
lon1 = radians(lat_long['LONGITUDE'][0])
lat2 = radians(lat_long['LATITUDE'][1])
lon2 = radians(lat_long['LONGITUDE'][1])
R = 6373.0
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
print("Result:", round(distance,4))
我需要做的是创建一个函数,该函数使用上面的公式来计算从所有点到所有点的距离,就像在数组中一样。但是我很难考虑要执行什么功能并存储点之间的距离。欢迎任何帮助。输出示例(如果我不清楚,仅用于说明目的):
| |point 0 | point1 | point2 |
|point0 | 0 | 2 | 3 |
|point1 | 2 | 0 | 4 |
|point2 | 3 | 4 | 0 |
|distance|distance|distance|
答案 0 :(得分:1)
您可以使用pdist计算成对距离:
import pandas as pd
import numpy as np
from math import sin, cos, sqrt, atan2, radians
from scipy.spatial.distance import pdist, squareform
lat_long = pd.DataFrame({'LATITUDE': [-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})
def dist(x, y):
"""Function to compute the distance between two points x, y"""
lat1 = radians(x[0])
lon1 = radians(x[1])
lat2 = radians(y[0])
lon2 = radians(y[1])
R = 6373.0
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
return round(distance, 4)
distances = pdist(lat_long.values, metric=dist)
points = [f'point_{i}' for i in range(1, len(lat_long) + 1)]
result = pd.DataFrame(squareform(distances), columns=points, index=points)
print(result)
输出
point_1 point_2 point_3 point_4 point_5
point_1 0.0000 20.5115 8.4123 15.3203 50.1784
point_2 20.5115 0.0000 16.3400 15.8341 30.0319
point_3 8.4123 16.3400 0.0000 6.9086 44.1838
point_4 15.3203 15.8341 6.9086 0.0000 40.0284
point_5 50.1784 30.0319 44.1838 40.0284 0.0000
请注意,squareform
从稀疏矩阵转换为密集矩阵,因此结果存储在numpy数组中。
答案 1 :(得分:1)
另一种可能的解决方案是
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import sin, cos, sqrt, atan2, radians
lat_long = pd.DataFrame({'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89], 'LONGITUDE': [-43.19, -43.39, -43.24, -43.28, -43.67]})
lat_long
test = lat_long.iloc[2:,:]
def distance(city1, city2):
lat1 = radians(city1['LATITUDE'])
lon1 = radians(city1['LONGITUDE'])
lat2 = radians(city2['LATITUDE'])
lon2 = radians(city2['LONGITUDE'])
R = 6373.0
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
return distance
dist = np.zeros([lat_long.shape[0],lat_long.shape[0]])
for i1, city1 in lat_long.iterrows():
for i2, city2 in lat_long.iloc[i1+1:,:].iterrows():
dist[i1,i2] = distance(city1, city2)
print(dist)
输出
[[ 0. 20.51149047 8.41230771 15.32026132 50.17836849]
[ 0. 0. 16.33997119 15.83407186 30.03192954]
[ 0. 0. 0. 6.90864606 44.18376436]
[ 0. 0. 0. 0. 40.02842872]
[ 0. 0. 0. 0. 0. ]]
距离矩阵的下三角为空,因为该矩阵是对称的(dist[i1,i2]==dist[i2,i1]
)