Question

我有一个目标阵列A，它代表NCEP再分析数据中的等压压力水平。我也有一个压力，在这个压力下观察到一个长时间序列的云，B。

我正在寻找的是一个k-最近邻居查找，它返回那些最近邻居的索引，类似于Matlab中的RecyclerView，可以在python中表示相同，例如：knnsearch 其中indices, distance = knnsearch(A, B, n)是indices中n中A中每个值的B索引，而distance中B的值是A从A中最近的值开始，B和B可以有不同的长度（这是我到目前为止大多数解决方案中遇到的瓶颈，我必须循环每个indices中的值，以返回distance和import numpy as np A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10]) # this is a fixed 17-by-1 array B = np.array([923, 584.2, 605.3, 153.2]) # this can be any n-by-1 array n = 2）

indices, distance = knnsearch(A, B, n)

我想从indices = [[1, 2],[4, 5] etc...]返回的是：

其中A中的A[1]=925与第一个A[2]=850匹配，然后584.2 A中的A[4]=600与第一个A[5]=500匹配，然后distance = [[72, 77],[15.8, 84.2] etc...]

其中B表示A中查询值与distance[0, 0] == np.abs(B[0] - A[1])中最接近的值之间的距离，例如import numpy as np def knnsearch(A, B, n): indices = np.zeros((len(B), n)) distances = np.zeros((len(B), n)) for i in range(len(B)): a = A for N in range(n): dif = np.abs(a - B[i]) ind = np.argmin(dif) indices[i, N] = ind + N distances[i, N] = dif[ind + N] # remove this neighbour from from future consideration np.delete(a, ind) return indices, distances array_A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10]) array_B = np.array([923, 584.2, 605.3, 153.2]) neighbours = 2 indices, distances = knnsearch(array_A, array_B, neighbours) print(indices) print(distances)。 [[ 1. 2.] [ 4. 5.] [ 4. 3.] [10. 11.]] [[ 2. 73. ] [ 15.8 84.2] [ 5.3 94.7] [ 3.2 53.2]]

我能够提出的唯一解决方案是：

{{1}}

返回：

{{1}}

必须有一种方法来删除for循环，因为我需要性能，如果我的A和B数组包含数千个具有许多最近邻居的元素......

请帮忙！谢谢:)）

Answer 1

第二个循环可以很容易地进行矢量化。最直接的方法是使用np.argsort并选择与n个最小dif值对应的索引。但是，对于大型数组，由于只应对n个值进行排序，因此最好使用np.argpartition。

因此，代码看起来像这样：

def vector_knnsearch(A, B, n):
    indices = np.empty((len(B), n))
    distances = np.empty((len(B), n))

    for i,b in enumerate(B):
        dif = np.abs(A - b)
        min_ind = np.argpartition(dif,n)[:n] # Returns the indexes of the 3 smallest
                                             # numbers but not necessarily sorted
        ind = min_ind[np.argsort(dif[min_ind])] # sort output of argpartition just in case
        indices[i, :] = ind
        distances[i, :] = dif[ind]

    return indices, distances

如评论中所述，第一个循环也可以使用meshgrid删除，但是，额外使用内存和计算时间来构造meshgrid会使这种方法对于我尝试的维度变慢（这可能会变得更糟对于大型数组并最终出现内存错误）。此外，代码的可读性降低。总的来说，这可能会使这种方法更少pythonic。

def mesh_knnsearch(A, B, n):
    m = len(B)
    rng = np.arange(m).reshape((m,1))
    Amesh, Bmesh = np.meshgrid(A,B)
    dif = np.abs(Amesh-Bmesh)
    min_ind = np.argpartition(dif,n,axis=1)[:,:n]
    ind = min_ind[rng,np.argsort(dif[rng,min_ind],axis=1)]

    return ind, dif[rng,ind]

并不是将此rng定义为2d数组以便重新计算a[rng[0],ind[0]]，a[rng[1],ind[1]]等并维护数组的维度并不重要，因为选择{{ 1}}检索a[:,ind]，a[:,ind[0]]等

如何从一个系列到另一个系列查找最近的邻居索引

1 个答案: