我有一个2D numpy数组,其中填充了从0到N的整数值,如何获得所有直接连接的条目的索引并共享相同的值。
补充:大多数条目为零,可以忽略!
示例输入数组:
[ 0 0 0 0 0 ]
[ 1 1 0 1 1 ]
[ 0 1 0 1 1 ]
[ 1 0 0 0 0 ]
[ 2 2 2 2 2 ]
希望的产出指数:
1: [ [1 0] [1 1] [2 1] [3 0] ] # first 1 cluster
[ [1 3] [1 4] [2 3] [2 4] ] # second 1 cluster
2: [ [4 0] [4 1] [4 2] [4 3] [4 4] ] # only 2 cluster
输出数组的格式化并不重要,我只需要可以解决单个索引的分离值集群
我首先想到的是:
N = numberClusters
x = myArray
for c in range(N):
for i in np.where(x==c):
# fill output array with i
但这会错过具有相同值的聚类的分离
答案 0 :(得分:1)
您可以使用skimage.measure.label
(如果需要,可以使用pip install scikit-image
安装):
import numpy as np
from skimage import measure
# Setup some data
np.random.seed(42)
img = np.random.choice([0, 1, 2], (5, 5), [0.7, 0.2, 0.1])
# [[2 0 2 2 0]
# [0 2 1 2 2]
# [2 2 0 2 1]
# [0 1 1 1 1]
# [0 0 1 1 0]]
# Label each region, considering only directly adjacent pixels connected
img_labeled = measure.label(img, connectivity=1)
# [[1 0 2 2 0]
# [0 3 4 2 2]
# [3 3 0 2 5]
# [0 5 5 5 5]
# [0 0 5 5 0]]
# Get the indices for each region, excluding zeros
idx = [np.where(img_labeled == label)
for label in np.unique(img_labeled)
if label]
# [(array([0]), array([0])),
# (array([0, 0, 1, 1, 2]), array([2, 3, 3, 4, 3])),
# (array([1, 2, 2]), array([1, 0, 1])),
# (array([1]), array([2])),
# (array([2, 3, 3, 3, 3, 4, 4]), array([4, 1, 2, 3, 4, 2, 3]))]
# Get the bounding boxes of each region (ignoring zeros)
bboxes = [area.bbox for area in measure.regionprops(img_labeled)]
# [(0, 0, 1, 1),
# (0, 2, 3, 5),
# (1, 0, 3, 2),
# (1, 2, 2, 3),
# (2, 1, 5, 5)]
可以使用非常有用的函数skimage.measure.regionprops
找到边界框,其中包含有关区域的大量信息。对于边界框,它返回(min_row, min_col, max_row, max_col)
元组,其中属于边界框的像素处于半开区间[min_row; max_row)
和[min_col; max_col)
。