Question

我有二维坐标的三维ndarray，例如：

[[[1704 1240]
  [1745 1244]
  [1972 1290]
  [2129 1395]
  [1989 1332]]

 [[1712 1246]
  [1750 1246]
  [1964 1286]
  [2138 1399]
  [1989 1333]]

 [[1721 1249]
  [1756 1249]
  [1955 1283]
  [2145 1399]
  [1990 1333]]]

最终目标是从5个坐标的每个“组”中移除最接近给定点（[1989 1332]）的点。我的想法是生成一个类似形状的距离数组，然后使用argmin来确定要删除的值的索引。但是，我不知道如何应用一个函数，比如计算到给定点的距离，到ndarray中的每个元素，至少以NumPythonic方式。

Answer 1

列表推导是处理numpy数组的一种非常低效的方法。它们是距离计算的一个特别糟糕的选择。

要查找数据和某个点之间的差异，您只需执行data - point。然后，您可以使用np.hypot计算距离，或者如果您愿意，可以将其平方，求和，并取平方根。

如果为了计算的目的而使它成为Nx2数组会更容易。

基本上，你想要这样的东西：

import numpy as np

data = np.array([[[1704, 1240],
                  [1745, 1244],
                  [1972, 1290],
                  [2129, 1395],
                  [1989, 1332]],

                 [[1712, 1246],
                  [1750, 1246],
                  [1964, 1286],
                  [2138, 1399],
                  [1989, 1333]],

                 [[1721, 1249],
                  [1756, 1249],
                  [1955, 1283],
                  [2145, 1399],
                  [1990, 1333]]])

point = [1989, 1332]

#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)

# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist

这会产生：

array([[[ 299.48121811],
        [ 259.38388539],
        [  45.31004304],
        [ 153.5219854 ],
        [   0.        ]],

       [[ 290.04310025],
        [ 254.0019685 ],
        [  52.35456045],
        [ 163.37074401],
        [   1.        ]],

       [[ 280.55837182],
        [ 247.34186868],
        [  59.6405902 ],
        [ 169.77926846],
        [   1.41421356]]])

现在，删除最接近的元素比简单地获取最接近的元素要困难得多。

使用numpy，您可以使用布尔索引来相当容易地完成此操作。

但是，您需要担心轴的对齐。

关键是要理解最后轴上的numpy“广播”操作。在这种情况下，我们希望沿着中轴进行brodcast。

此外，-1可用作轴大小的占位符。当-1作为轴的大小放入时，Numpy将计算允许的大小。

我们需要做的事情看起来有点像这样：

#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]

# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

你可以把它改成一行，我只是为了便于阅读而将其分解。关键是dist != something产生一个布尔数组，然后您可以使用它来索引原始数组。

所以，把它们放在一起：

import numpy as np

data = np.array([[[1704, 1240],
                  [1745, 1244],
                  [1972, 1290],
                  [2129, 1395],
                  [1989, 1332]],

                 [[1712, 1246],
                  [1750, 1246],
                  [1964, 1286],
                  [2138, 1399],
                  [1989, 1333]],

                 [[1721, 1249],
                  [1756, 1249],
                  [1955, 1283],
                  [2145, 1399],
                  [1990, 1333]]])

point = [1989, 1332]

#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)

# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)

#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]

# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

print filtered

收率：

array([[[1704, 1240],
        [1745, 1244],
        [1972, 1290],
        [2129, 1395]],

       [[1712, 1246],
        [1750, 1246],
        [1964, 1286],
        [2138, 1399]],

       [[1721, 1249],
        [1756, 1249],
        [1955, 1283],
        [2145, 1399]]])

另一方面，如果多于一个点同样接近，则无效。 Numpy数组必须在每个维度上具有相同数量的元素，因此在这种情况下您需要重新进行分组。

Answer 2

如果我理解你的问题，我认为你正在寻找apply_along_axis。使用numpy的内置广播，我们可以简单地从数组中减去该点：

>>> a - numpy.array([1989, 1332])
array([[[-285,  -92],
        [-244,  -88],
        [ -17,  -42],
        [ 140,   63],
        [   0,    0]],

       [[-277,  -86],
        [-239,  -86],
        [ -25,  -46],
        [ 149,   67],
        [   0,    1]],

       [[-268,  -83],
        [-233,  -83],
        [ -34,  -49],
        [ 156,   67],
        [   1,    1]]])

然后我们可以将numpy.linalg.norm应用于它：

>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811,  259.38388539,   45.31004304,  
         153.5219854 ,    0.        ],
       [ 290.04310025,  254.0019685 ,   52.35456045,  
         163.37074401,    1.        ],
       [ 280.55837182,  247.34186868,   59.6405902 ,  
         169.77926846,    1.41421356]])

最后，一些布尔掩码技巧，以及几个reshape调用：

>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
        [1745, 1244],
        [1972, 1290],
        [2129, 1395]],

       [[1712, 1246],
        [1750, 1246],
        [1964, 1286],
        [2138, 1399]],

       [[1721, 1249],
        [1756, 1249],
        [1955, 1283],
        [2145, 1399]]])

但是乔金顿的答案更快了。那好吧。我会把它留给子孙后代。

def joes(data, point):
    dist = data.reshape((-1,2)) - point
    dist = np.hypot(*dist.T)
    dist = dist.reshape(data.shape[0], data.shape[1], 1)
    mask = np.squeeze(dist) != dist.min(axis=1)
    return data[mask].reshape((3, 4, 2))

def mine(a, point):
    dist = a - point
    normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
    return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))

>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop

Answer 3

有多种方法可以做到这一点，但这里有一个使用列表推导的方法：

距离函数：

In [35]: from numpy.linalg import norm

In [36]: dist = lambda x,y:norm(x-y)

输入数据：

In [39]: GivenMatrix = scipy.rand(3, 5, 2)

In [40]: GivenMatrix
Out[40]: 
array([[[ 0.83798666,  0.90294439],
        [ 0.8706959 ,  0.88397176],
        [ 0.91879085,  0.93512921],
        [ 0.15989245,  0.57311869],
        [ 0.82896003,  0.53589968]],

       [[ 0.0207089 ,  0.9521768 ],
        [ 0.94523963,  0.31079109],
        [ 0.41929482,  0.88559614],
        [ 0.87885236,  0.45227422],
        [ 0.58365369,  0.62095507]],

       [[ 0.14757177,  0.86101539],
        [ 0.58081214,  0.12632764],
        [ 0.89958321,  0.73660852],
        [ 0.3408943 ,  0.45420989],
        [ 0.42656333,  0.42770216]]])

In [41]: q = scipy.rand(2)

In [42]: q
Out[42]: array([ 0.03280889,  0.71057403])

计算输出距离：

In [44]: distances = [[dist(x, q) for x in SubMatrix] 
                      for SubMatrix in GivenMatrix]

In [45]: distances
Out[45]: 
[[0.82783910695733931,
  0.85564093542511577,
  0.91399620574915652,
  0.18720096539588818,
  0.81508758596405939],
 [0.24190557184498068,
  0.99617079746515047,
  0.42426891258164884,
  0.88459501973012633,
  0.55808740166908177],
 [0.18921712490174292,
  0.80103146210692744,
  0.86716521557255788,
  0.40079819635686459,
  0.48482888965287363]]

对每个子矩阵的结果进行排名：

In [46]: scipy.argsort(distances)
Out[46]: 
array([[3, 4, 0, 1, 2],
       [0, 2, 4, 3, 1],
       [0, 3, 4, 1, 2]])

关于删除，我个人认为最简单的方法是将GivenMatrix转换为list，然后使用del：

>>> GivenList = GivenMatrix.tolist()

>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix

NumPy：对每个ndarray元素执行函数

3 个答案: