我有二维坐标的三维ndarray,例如:
[[[1704 1240]
[1745 1244]
[1972 1290]
[2129 1395]
[1989 1332]]
[[1712 1246]
[1750 1246]
[1964 1286]
[2138 1399]
[1989 1333]]
[[1721 1249]
[1756 1249]
[1955 1283]
[2145 1399]
[1990 1333]]]
最终目标是从5个坐标的每个“组”中移除最接近给定点([1989 1332])的点。我的想法是生成一个类似形状的距离数组,然后使用argmin来确定要删除的值的索引。但是,我不知道如何应用一个函数,比如计算到给定点的距离,到ndarray中的每个元素,至少以NumPythonic方式。
答案 0 :(得分:4)
列表推导是处理numpy数组的一种非常低效的方法。它们是距离计算的一个特别糟糕的选择。
要查找数据和某个点之间的差异,您只需执行data - point
。然后,您可以使用np.hypot
计算距离,或者如果您愿意,可以将其平方,求和,并取平方根。
如果为了计算的目的而使它成为Nx2数组会更容易。
基本上,你想要这样的东西:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist
这会产生:
array([[[ 299.48121811],
[ 259.38388539],
[ 45.31004304],
[ 153.5219854 ],
[ 0. ]],
[[ 290.04310025],
[ 254.0019685 ],
[ 52.35456045],
[ 163.37074401],
[ 1. ]],
[[ 280.55837182],
[ 247.34186868],
[ 59.6405902 ],
[ 169.77926846],
[ 1.41421356]]])
现在,删除最接近的元素比简单地获取最接近的元素要困难得多。
使用numpy,您可以使用布尔索引来相当容易地完成此操作。
但是,您需要担心轴的对齐。
关键是要理解最后轴上的numpy“广播”操作。在这种情况下,我们希望沿着中轴进行brodcast。
此外,-1
可用作轴大小的占位符。当-1
作为轴的大小放入时,Numpy将计算允许的大小。
我们需要做的事情看起来有点像这样:
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
你可以把它改成一行,我只是为了便于阅读而将其分解。关键是dist != something
产生一个布尔数组,然后您可以使用它来索引原始数组。
所以,把它们放在一起:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
print filtered
收率:
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
另一方面,如果多于一个点同样接近,则无效。 Numpy数组必须在每个维度上具有相同数量的元素,因此在这种情况下您需要重新进行分组。
答案 1 :(得分:1)
如果我理解你的问题,我认为你正在寻找apply_along_axis
。使用numpy
的内置广播,我们可以简单地从数组中减去该点:
>>> a - numpy.array([1989, 1332])
array([[[-285, -92],
[-244, -88],
[ -17, -42],
[ 140, 63],
[ 0, 0]],
[[-277, -86],
[-239, -86],
[ -25, -46],
[ 149, 67],
[ 0, 1]],
[[-268, -83],
[-233, -83],
[ -34, -49],
[ 156, 67],
[ 1, 1]]])
然后我们可以将numpy.linalg.norm
应用于它:
>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811, 259.38388539, 45.31004304,
153.5219854 , 0. ],
[ 290.04310025, 254.0019685 , 52.35456045,
163.37074401, 1. ],
[ 280.55837182, 247.34186868, 59.6405902 ,
169.77926846, 1.41421356]])
最后,一些布尔掩码技巧,以及几个reshape
调用:
>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
但是乔金顿的答案更快了。那好吧。我会把它留给子孙后代。
def joes(data, point):
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
dist = dist.reshape(data.shape[0], data.shape[1], 1)
mask = np.squeeze(dist) != dist.min(axis=1)
return data[mask].reshape((3, 4, 2))
def mine(a, point):
dist = a - point
normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop
答案 2 :(得分:0)
有多种方法可以做到这一点,但这里有一个使用列表推导的方法:
距离函数:
In [35]: from numpy.linalg import norm
In [36]: dist = lambda x,y:norm(x-y)
输入数据:
In [39]: GivenMatrix = scipy.rand(3, 5, 2)
In [40]: GivenMatrix
Out[40]:
array([[[ 0.83798666, 0.90294439],
[ 0.8706959 , 0.88397176],
[ 0.91879085, 0.93512921],
[ 0.15989245, 0.57311869],
[ 0.82896003, 0.53589968]],
[[ 0.0207089 , 0.9521768 ],
[ 0.94523963, 0.31079109],
[ 0.41929482, 0.88559614],
[ 0.87885236, 0.45227422],
[ 0.58365369, 0.62095507]],
[[ 0.14757177, 0.86101539],
[ 0.58081214, 0.12632764],
[ 0.89958321, 0.73660852],
[ 0.3408943 , 0.45420989],
[ 0.42656333, 0.42770216]]])
In [41]: q = scipy.rand(2)
In [42]: q
Out[42]: array([ 0.03280889, 0.71057403])
计算输出距离:
In [44]: distances = [[dist(x, q) for x in SubMatrix]
for SubMatrix in GivenMatrix]
In [45]: distances
Out[45]:
[[0.82783910695733931,
0.85564093542511577,
0.91399620574915652,
0.18720096539588818,
0.81508758596405939],
[0.24190557184498068,
0.99617079746515047,
0.42426891258164884,
0.88459501973012633,
0.55808740166908177],
[0.18921712490174292,
0.80103146210692744,
0.86716521557255788,
0.40079819635686459,
0.48482888965287363]]
对每个子矩阵的结果进行排名:
In [46]: scipy.argsort(distances)
Out[46]:
array([[3, 4, 0, 1, 2],
[0, 2, 4, 3, 1],
[0, 3, 4, 1, 2]])
关于删除,我个人认为最简单的方法是将GivenMatrix
转换为list
,然后使用del
:
>>> GivenList = GivenMatrix.tolist()
>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix