我有一个多维数组,如下所示:
array = [[a, b, b, a, a, b]
[a, a, b, a, b, a]]
我想要做的是识别相似元素的集群,即查看每个元素,并根据是否存在另一个元素来找到一组“a' a'在其上方,下方,左侧或右侧,以便程序找到如下数组:
[ a , b , b , a , a ,b]
[ a , a , b , a ,b,a ]
它将返回一个像这样的数组[" 0:0"," 1:0"," 1:1"]用于第一个& #39;一个' S
我的问题是,在python中执行此操作的最有效方法是什么?
仅供参考:我使用的是Python 2.7
答案 0 :(得分:2)
from scipy import ndimage
def find_clusters(array):
clustered = np.empty_like(array)
unique_vals = np.unique(array)
cluster_count = 0
for val in unique_vals:
labelling, label_count = ndimage.label(array == val)
for k in range(1, label_count + 1):
clustered[labelling == k] = cluster_count
cluster_count += 1
return clustered, cluster_count
clusters, cluster_count = find_clusters(array)
print("Found {} clusters:".format(cluster_count))
print(clusters)
ones = np.ones_like(array, dtype=int)
cluster_sizes = ndimage.sum(ones, labels=clusters, index=range(cluster_count)).astype(int)
com = ndimage.center_of_mass(ones, labels=clusters, index=range(cluster_count))
for i, (size, center) in enumerate(zip(cluster_sizes, com)):
print("Cluster #{}: {} elements at {}".format(i, size, center))
的产率:
Found 6 clusters:
[[0 3 3 1 1 4]
[0 0 3 1 5 2]]
Cluster #0: 3 elements at (0.66666666666666663, 0.33333333333333331)
Cluster #1: 3 elements at (0.33333333333333331, 3.3333333333333335)
Cluster #2: 1 elements at (1.0, 5.0)
Cluster #3: 3 elements at (0.33333333333333331, 1.6666666666666667)
Cluster #4: 1 elements at (0.0, 5.0)
Cluster #5: 1 elements at (1.0, 4.0)
要获取每个群集中元素的位置,您可以执行clusters == cluster_id
,例如
In [126]:
clusters == 3
Out[126]:
array([[False, True, True, False, False, False],
[False, False, True, False, False, False]], dtype=bool)
或者,要获取每个群集的边界框,您可以使用同一个SciPy包中的find_objects
:
In [128]:
# +1 because zeros would be ignored otherwise
scipy.ndimage.measurements.find_objects(clusters+1)
Out[128]:
[(slice(0, 2, None), slice(0, 2, None)),
(slice(0, 2, None), slice(3, 5, None)),
(slice(1, 2, None), slice(5, 6, None)),
(slice(0, 2, None), slice(1, 3, None)),
(slice(0, 1, None), slice(5, 6, None)),
(slice(1, 2, None), slice(4, 5, None))]