Question

我有以下索引，就像你从np.where(...)得到它们一样：

coords = (
  np.asarray([0 0 0 1 1 1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 5 5 6 6 6]),
  np.asarray([2 2 8 2 2 4 4 6 2 2 6 2 2 4 6 2 2 6 2 2 4 4 6 2 2 6]),
  np.asarray([0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]),
  np.asarray([0 1 0 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1])
)

另一个带索引的元组用于选择coords中的元组：

index = tuple(
  np.asarray([0 0 1 1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 5 5 6 6 6]),
  np.asarray([2 8 2 4 4 6 2 2 6 2 2 4 6 2 2 6 2 2 4 4 6 2 2 6]),
  np.asarray([0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]),
  np.asarray([0 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1])
)

因此，例如，选择了coords [0]，因为它位于索引（位置0），但未选择coords[1]，因为它在index中不可用。

我可以使用[x in zip(*index) for x in zip(*coords)]轻松计算遮罩（从bool转换为int以获得更好的可读性）：

[1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

但这对于较大的阵列来说效率不高。是否有更“基于numpy”的方式可以计算面具？

Answer 1

对效率不太确定，但鉴于您基本上比较坐标对，您可以使用scipy距离函数。一些事情：

from scipy.spatial.distance import cdist

c = np.stack(coords).T
i = np.stack(index).T

d = cdist(c, i)

In [113]: np.any(d == 0, axis=1).astype(int)
Out[113]: 
array([1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1])

默认情况下它使用L2规范，你可以用更简单的距离函数使它更快一点，例如：

d = cdist(c,i, lambda u, v: np.all(np.equal(u,v)))
np.any(d != 0, axis=1).astype(int)

Answer 2

您可以使用np.ravel_multi_index至compress the columns into unique numbers更易于处理：

cmx = *map(np.max, coords),
imx = *map(np.max, index),
shape = np.maximum(cmx, imx) + 1

ct = np.ravel_multi_index(coords, shape)
it = np.ravel_multi_index(index, shape)

it.sort()

result = ct == it[it.searchsorted(ct)]
print(result.view(np.int8))

打印：

[1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

在numpy中获取索引子集的有效方法

2 个答案: