Question

我是蛮横的，使用df['column'].to_numpy()使用来自熊猫数据帧的数据来计算2D平面上一个点到许多其他点的最短距离。

当前，我正在对numpy数组使用嵌套的for循环来填充列表，以获取该列表的最小值并将该值存储在另一个列表中。

相对于25,000（来自df_point），检查1000点（来自df_compare）大约需要一分钟，因为这是一个低效率的过程。我的代码在下面。

point_x = df_point['x'].to_numpy()
compare_x = df_compare['x'].to_numpy()
point_y = df_point['y'].to_numpy()
compare_y = df_compare['y'].to_numpy()
dumarr = []
minvals = []

# Brute force caclulate the closet point by using the Pythagorean theorem comparing each
# point to every other point
for k in range(len(point_x)):
    for i,j in np.nditer([compare_x,compare_y]):
        dumarr.append(((point_x[k] - i)**2 + (point_y[k] - j)**2))
    minval.append(df_compare['point_name'][dumarr.index(min(dumarr))])
    # Clear dummy array (otherwise it will continuously append to)
    dumarr = []

这不是一个特别的pythonic。是否可以通过向量化或至少不使用嵌套的for循环来做到这一点？

Answer 1

方法是创建一个1000 x 25000的矩阵，然后找到行最小值的索引。

# distances for all combinations (1000x25000 matrix)
dum_arr = (point_x[:, None] - compare_x)**2 + (point_y[:, None] - compare_y)**2

# indices of minimums along rows
idx = np.argmin(dum_arr, axis=1)

# Not sure what is needed from the indices, this get the values 
# from `point_name` dataframe using found indices
min_vals = df_compare['point_name'].iloc[idx]

Answer 2

您可以尝试分别在x和y方向上找到最接近的点，而不是寻找最接近的点，然后通过使用内置的min函数（如该问题的最高答案）来比较两者，以找到更接近的点。：

min(myList, key=lambda x:abs(x-myNumber))

from list of integers, get number closest to a given value

编辑：如果您在一个函数调用中全部完成，则循环将最终像这样。另外，我不确定min函数是否最终会以与当前代码相同的时间遍历比较数组：

for k,m in np.nditer([point_x, point_y]): min = min(compare_x, compare_y, key=lambda x,y: (x-k)**2 + (y-m)**2 )

另一种选择是针对比较数组中的所有点预先计算从（0,0）或另一个点（如（-1000,1000））的距离，基于该点对比较数组进行排序，然后仅检查点与参考距离相近。

Answer 3

我要给你办法：

使用以下列创建DataFrame-> pointID，CoordX，CoordY
创建一个偏移值为1（oldDF.iloc [pointIDx] = newDF.iloc [pointIDx] -1）的辅助DataFrame
此偏移值需要从1循环到坐标1为止
tempDF [“ Euclid Dist”] = sqrt（square（oldDf [“ CoordX”]-newDF [“ CoordX”]）+ square（oldDf [“ CoordY”]-newDF [“ CoordY”]））
将此tempDF附加到列表中

为什么会这样更快？

只有一个循环来迭代从1到坐标1的偏移量
矢量化已在第4步中完成
利用numpy平方根和平方函数来确保最佳结果

Answer 4

下面是使用scipy cdist的示例，它非常适合此类问题：

import numpy as np
from scipy.spatial.distance import cdist

point = np.array([[1, 2], [3, 5], [4, 7]])
compare = np.array([[3, 2], [8, 5], [4, 1], [2, 2], [8, 9]])

# create 3x5 distance matrix
dm = cdist(point, compare)
# get row-wise mins
mins = dm.min(axis=1)

使用多个numpy数组进行计算而没有for循环

4 个答案: