Python:为大量位置生成距离矩阵

时间:2019-10-09 15:10:21

标签: python haversine distance-matrix

我想使用Haversine公式基于500个位置的经度和纬度生成距离矩阵500X500。

这是10个位置的示例数据“ coordinate.csv”:

Name,Latitude,Longitude
depot1,35.492807,139.6681689
depot2,33.6625572,130.4096027
depot3,35.6159881,139.7805445
customer1,35.622632,139.732631
customer2,35.857287,139.821461
customer3,35.955313,139.615387
customer4,35.16073,136.926239
customer5,36.118163,139.509548
customer6,35.937351,139.909783
customer7,35.949508,139.676462

获取距离矩阵后,我想根据距离矩阵找到最接近每个客户的仓库,然后将输出(每个客户到壁橱仓库的距离和最接近仓库的名称)保存到Pandas DataFrame。

预期输出:

// Distance matrix
[ [..],[..],[..],[..],[..],[..],[..],[..],[..],[..] ]

// Closet depot to each customer (just an example)
Name,Latitude,Longitude,Distance_to_closest_depot,Closest_depot
depot1,35.492807,139.6681689,,
depot2,33.6625572,130.4096027,,
depot3,35.6159881,139.7805445,,
customer1,35.622632,139.732631,10,depot1
customer2,35.857287,139.821461,20,depot3
customer3,35.955313,139.615387,15,depot2
customer4,35.16073,136.926239,12,depot3
customer5,36.118163,139.509548,25,depot1
customer6,35.937351,139.909783,22,depot2
customer7,35.949508,139.676462,15,depot1

1 个答案:

答案 0 :(得分:1)

有一些库函数可以帮助您解决此问题:

    scipy中的
  • cdist可用于使用您喜欢的任何距离度量来生成距离矩阵。
  • 还有一个haversine函数,您可以将其传递给cdist

在那之后,只是从距离矩阵中查找逐行最小值并将其添加到DataFrame的情况。完整代码如下:

import pandas as pd
from scipy.spatial.distance import cdist
from haversine import haversine


df = pd.read_clipboard(sep=',')
df.set_index('Name', inplace=True)
customers = df[df.index.str.startswith('customer')]
depots = df[df.index.str.startswith('depot')]

dm = cdist(customers, depots, metric=haversine)
closest = dm.argmin(axis=1)
distances = dm.min(axis=1)

customers['Closest Depot'] = depots.index[closest]
customers['Distance'] = distances

结果:

            Latitude   Longitude Closest Depot    Distance
Name                                                      
customer1  35.622632  139.732631        depot3    4.393506
customer2  35.857287  139.821461        depot3   27.084212
customer3  35.955313  139.615387        depot3   40.565820
customer4  35.160730  136.926239        depot1  251.466152
customer5  36.118163  139.509548        depot3   60.945377
customer6  35.937351  139.909783        depot3   37.587862
customer7  35.949508  139.676462        depot3   38.255776

根据评论,我创建了一个替代解决方案,改为使用平方距离矩阵。我认为原始解决方案更好,因为该问题表明我们只想为每个客户找到最接近的仓库,因此不需要计算客户之间以及仓库之间的距离。但是,如果您出于其他目的需要平方距离矩阵,请按以下步骤创建它:

import pandas as pd
import numpy as np
from scipy.spatial.distance import squareform, pdist
from haversine import haversine


df = pd.read_clipboard(sep=',')
df.set_index('Name', inplace=True)

dm = pd.DataFrame(squareform(pdist(df, metric=haversine)), index=df.index, columns=df.index)
np.fill_diagonal(dm.values, np.inf)  # Makes it easier to find minimums

customers = df[df.index.str.startswith('customer')]
depots = df[df.index.str.startswith('depot')]
customers['Closest Depot'] = dm.loc[depots.index, customers.index].idxmin()
customers['Distance'] = dm.loc[depots.index, customers.index].min()

最终结果与以前相同,除了现在有了平方距离矩阵。如果愿意,可以在提取最小值之后将0放回对角线上。

np.fill_diagonal(dm.values, 0)