根据邻近度从2D numpy数组中删除元素

时间:2015-04-05 21:26:26

标签: python arrays numpy

我有一个以下形状的ndarray:

[[5 2][6 2][10 2][10 10]]

这些是X Y坐标。如何迭代此数组并删除所有后续元素,这些元素在2D(欧几里德距离)中比当前元素更接近于任意阈值数?例如,我将距离阈值设置为1,我将得到以下数组

[[5 2][10 2][10 10]]

[6 2]将被删除,因为它比阈值(1)更接近。

现实世界数据:

array([[ 25, 478],
   [ 26, 366],
   [ 26, 478],
   [ 27, 183],
   [ 28, 367],
   [ 28, 477],
   [ 29, 477],
   [ 43, 374],
   [ 44, 374],
   [ 45, 374],
   [ 46, 374],
   [ 47, 374],
   [ 47, 375],
   [ 57,  82],
   [ 58, 133],
   [ 60,  25],
   [ 86, 445],
   [ 89, 226],
   [ 89, 227],
   [ 89, 228],
   [ 89, 229],
   [ 89, 230],
   [ 96, 286],
   [105, 404],
   [106, 404],
   [107, 403],
   [108, 403],
   [117, 355],
   [119, 355],
   [121,  43],
   [122,  42],
   [122,  43],
   [122, 127],
   [122, 490],
   [123, 489],
   [123, 490],
   [137, 438],
   [138, 437],
   [151, 229],
   [162, 149],
   [163, 326],
   [188, 465],
   [188, 466],
   [189, 115],
   [189, 116],
   [218, 291],
   [230, 174],
   [230, 175],
   [230, 176],
   [230, 177],
   [231, 173],
   [231, 174],
   [231, 175],
   [231, 176],
   [231, 177],
   [231, 178],
   [240,  33],
   [241,  33],
   [242,  34],
   [249, 118],
   [250, 256],
   [260, 208],
   [260, 209],
   [260, 210],
   [274, 372],
   [277,  39],
   [302, 216],
   [302, 217],
   [302, 218],
   [302, 219],
   [302, 220],
   [302, 221],
   [302, 222],
   [302, 223],
   [315, 325],
   [322, 258],
   [322, 259],
   [341, 172],
   [346, 457],
   [359, 388],
   [360, 389],
   [361, 390],
   [386, 307],
   [392, 372],
   [393, 136],
   [393, 360],
   [393, 374],
   [394, 134],
   [394, 135],
   [394, 136],
   [394, 137],
   [394, 138],
   [394, 139],
   [394, 140],
   [394, 141],
   [394, 142],
   [394, 143],
   [394, 144],
   [409, 266],
   [437, 132],
   [439, 131],
   [467, 100],
   [471, 236],
   [472, 235],
   [474, 234],
   [479, 104]])

2 个答案:

答案 0 :(得分:1)

在从列表中删除元素的同时迭代列表是一种不好的做法,因为它会导致不可预见的效果。出于这个原因,我认为调用一个函数并从该函数返回一个新列表更为明确。

def reduce_tail(l, index, threshold=1):
    elm = l[index]
    mask = np.linalg.norm(elm-l, axis=1) > threshold
    mask[:index+1] = True  #ensure to return the head of the array unchanged
    return l[mask]

def my_reduce(z, threshold=1):
    z = np.array(z)
    index = 0
    while True:
        z = reduce_tail(z, index, threshold)
        index += 1
        if index == z.shape[0]:
            break
    return z.tolist()

演示:

>>> z = [[5, 2],[6, 2],[5,1],[10, 2],[10, 10]]
>>> x = [[5, 2],[6, 2],[6,3],[10, 2],[10, 10]]
>>> l = [[5, 2],[6, 2],[10, 2],[10, 10]]
>>> my_reduce(l)
[[5, 2], [10, 2], [10, 10]]
>>> my_reduce(x)
[[5, 2], [6, 3], [10, 2], [10, 10]]
>>> my_reduce(z)
[[5, 2], [10, 2], [10, 10]]
>>> 

答案 1 :(得分:0)

您可以使用np.linalg.norm检查距离,并获取可以使用的对zip功能:

>>> [i if np.linalg.norm(np.array(i)-np.array(j))==1 else j for i,j in zip(l,l[1:])]
[[5, 2], [10, 2], [10, 10]]

另外,作为一种更完整的方法,您可以使用递归函数:

[[5, 2], [6, 2], [6, 3], [10, 2], [10, 10]]
>>> def a(l):
...      for i,j in zip(l,l[1:]):
...              if np.linalg.norm(np.array(i)-np.array(j))==1 :
...                   l.remove(j)
...                   print l
...                   return a(l)
...      return l
... 
>>> a(l)
[[5, 2], [6, 3], [10, 2], [10, 10]]