Question

我试图通过删除所有完全隔离的单个单元来减少二进制python数组中的噪声，即如果它们完全被其他“0”包围，则将“1”值单元设置为0。通过使用循环删除大小等于1的blob，我已经能够获得一个有效的解决方案，但对于大型数组来说，这似乎是一个非常低效的解决方案：

import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt    

# Generate sample data
square = np.zeros((32, 32))
square[10:-10, 10:-10] = 1
np.random.seed(12)
x, y = (32*np.random.random((2, 20))).astype(np.int)
square[x, y] = 1

# Plot original data with many isolated single cells
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

# Assign unique labels
id_regions, number_of_ids = ndimage.label(square, structure=np.ones((3,3)))

# Set blobs of size 1 to 0
for i in xrange(number_of_ids + 1):
    if id_regions[id_regions==i].size == 1:
        square[id_regions==i] = 0

# Plot desired output, with all isolated single cells removed
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

在这种情况下，侵蚀和扩展我的数组将无法正常工作，因为它也会删除宽度为1的要素。我觉得解决方案位于 scipy.ndimage 包中，但是到目前为止我还没能破解它。任何帮助将不胜感激！

Answer 1

感谢Jaime和Kazemakase的回复。手动邻居检查方法确实删除了所有孤立的补丁，但也删除了一个角（即样本数组中正方形的右上角）附加到其他补丁的补丁。求和面积表完美地工作，并且在小样本阵列上非常快，但在较大阵列上减慢。

我最终采用了一种使用ndimage的方法，这种方法对于非常大且稀疏的数组似乎有效（对于5000 x 5000阵列为0.91秒，对于总和区域表方法为1.17秒）。我首先为每个离散区域生成标记的唯一ID数组，计算每个ID的大小，屏蔽大小数组以仅关注size == 1 blob，然后索引原始数组并将ID设置为大小== 1到0 ：

def filter_isolated_cells(array, struct):
    """ Return array with completely isolated single cells removed
    :param array: Array with completely isolated single cells
    :param struct: Structure array for generating unique regions
    :return: Array with minimum region size > 1
    """

    filtered_array = np.copy(array)
    id_regions, num_ids = ndimage.label(filtered_array, structure=struct)
    id_sizes = np.array(ndimage.sum(array, id_regions, range(num_ids + 1)))
    area_mask = (id_sizes == 1)
    filtered_array[area_mask[id_regions]] = 0
    return filtered_array

# Run function on sample array
filtered_array = filter_isolated_cells(square, struct=np.ones((3,3)))

# Plot output, with all isolated single cells removed
plt.imshow(filtered_array, cmap=plt.cm.gray, interpolation='nearest')

结果： Resulting array

Answer 2

在图像处理中摆脱孤立像素的典型方法是执行morphological opening，在scipy.ndimage.morphology.binary_opening中有一个现成的实现。这也会影响较大区域的轮廓。

对于DIY解决方案，我会使用summed area table计算每个3x3子图像中的项目数，从中心像素的值中减去，然后将结果出来的所有中心点都归零零。要正确处理边框，首先用零填充数组：

sat = np.pad(square, pad_width=1, mode='constant', constant_values=0)
sat = np.cumsum(np.cumsum(sat, axis=0), axis=1)
sat = np.pad(sat, ((1, 0), (1, 0)), mode='constant', constant_values=0)
# These are all the possible overlapping 3x3 windows sums
sum3x3 = sat[3:, 3:] + sat[:-3, :-3] - sat[3:, :-3] - sat[:-3, 3:]
# This takes away the central pixel value
sum3x3 -= square
# This zeros all the isolated pixels
square[sum3x3 == 0] = 0

上面的实现很有效，但是对于不创建中间数组并不特别小心，所以你可以通过充分重构来节省一些执行时间。

Answer 3

您可以手动检查邻居并使用矢量化来避免循环。

has_neighbor = np.zeros(square.shape, bool)
has_neighbor[:, 1:] = np.logical_or(has_neighbor[:, 1:], square[:, :-1] > 0)  # left
has_neighbor[:, :-1] = np.logical_or(has_neighbor[:, :-1], square[:, 1:] > 0)  # right
has_neighbor[1:, :] = np.logical_or(has_neighbor[1:, :], square[:-1, :] > 0)  # above
has_neighbor[:-1, :] = np.logical_or(has_neighbor[:-1, :], square[1:, :] > 0)  # below

square[np.logical_not(has_neighbor)] = 0

这种方式在广场上循环是由numpy在内部执行的，这比在python中循环更有效。这个解决方案有两个缺点：

如果您的数组非常稀疏，可能会有更有效的方法来检查非零点的邻域。
如果您的数组非常大，has_neighbor数组可能会消耗太多内存。在这种情况下，您可以循环较小的子数组（在python循环和向量化之间进行权衡）。

我没有使用ndimage的经验，因此可能会在某处构建更好的解决方案。

从Python数组中删除完全隔离的单元格？

3 个答案: