Question

我有两个数组，其中包含点坐标为shapely.geometry.Point，大小不同。

例如：

[Point(X Y), Point(X Y)...]
[Point(X Y), Point(X Y)...]

我想用距离函数创建这两个数组的“叉积”。距离函数来自shape.geometry，这是一种简单的几何矢量距离计算。我正在尝试在M：N点之间创建距离矩阵：

现在我具有此功能：

    source = gpd.read_file(source)
    near = gpd.read_file(near)

    source_list = source.geometry.values.tolist()
    near_list = near.geometry.values.tolist()

    array = np.empty((len(source.ID_SOURCE), len(near.ID_NEAR)))

    for index_source, item_source in enumerate(source_list):
        for index_near, item_near in enumerate(near_list):
            array[index_source, index_near] = item_source.distance(item_near)

    df_matrix = pd.DataFrame(array, index=source.ID_SOURCE, columns = near.ID_NEAR)

这可以很好地完成工作，但是速度很慢。 4000 x 4000点大约是100秒（我的数据集要大得多，所以速度是主要问题）。我想尽可能避免这种双重循环。我试图在Pandas数据框中这样做（速度太快了）：

for index_source, item_source in source.iterrows():
         for index_near, item_near in near.iterrows():
             df_matrix.at[index_source, index_near] = item_source.geometry.distance(item_near.geometry)

速度更快（但仍然比numpy慢4倍）：

    for index_source, item_source in enumerate(source_list):
        for index_near, item_near in enumerate(near_list):
             df_matrix.at[index_source, index_near] = item_source.distance(item_near)

有更快的方法吗？我想有，但我不知道如何进行。我也许可以将数据帧分成较小的块，然后将其发送到不同的内核，然后合并结果-这是不得已的方法。如果我们能以某种方式仅将numpy与仅索引魔术结合使用，我可以将其发送到GPU并立即完成。但是double for循环现在是不可以的。我也想不使用除Pandas / Numpy之外的任何其他库。我可以使用SAGA处理及其“点距离”模块（http://www.saga-gis.org/saga_tool_doc/2.2.2/shapes_points_3.html），该模块的速度非常快，但我正在寻找仅Python解决方案。

Answer 1

如果可以在单独的向量中获得坐标，则可以尝试以下操作：

import numpy as np

x = np.asarray([5.6, 2.1, 6.9, 3.1]) # Replace with data
y = np.asarray([7.2, 8.3, 0.5, 4.5]) # Replace with data

x_i = x[:, np.newaxis]
x_j = x[np.newaxis, :]

y_i = y[:, np.newaxis]
y_j = y[np.newaxis, :]

d = (x_i-x_j)**2+(y_i-y_j)**2

np.sqrt(d, out=d)

两点层之间的距离矩阵

1 个答案: