Question

此示例来自Data Science for dummies：

digits = load_digits()
X = digits.data
ground_truth = digits.target

pca = PCA(n_components=40)
Cx = pca.fit_transform(scale(X))

DB = DBSCAN(eps=4.35, min_samples=25, random_state=1)
DB.fit(Cx)



for k,cl in enumerate(np.unique(DB.labels_)):
    if cl >= 0:
        example = np.min(np.where(DB.labels_==cl)) # question 1
        plt.subplot(2, 3, k)
            plt.imshow(digits.images[example],cmap='binary', # question 2
            interpolation='none') 
        plt.title('cl '+str(cl))
plt.show()

我的问题是：

np.where（DB.labels _ == cl）我不明白我们应用哪个数组np.where 当我打印np.where（DB.labels _ == cl）时，看起来它应用于DB.core_sample_indices_。但我不明白为什么。正如我从np.where的文档中可以理解的那样，np.where（DB.labels _ == cl）应该应用于DB.labels _。
为什么np.min（np.where（DB.labels _ == cl））给了我indice，在digits.images中给我正确的图像。谢谢。

Answer 1

操作DB.labels_ == cl的输出是一个布尔数组，如果(DB.labels_ == cl)[i]，则True为DB.labels_[i] == cl。

因此np.where应用于数组DB.labels_ == cl。如果在单个数组上使用它的输出，则是该数组的非零元素，即元素是True。

操作np.where(DB.labels_ == cl)返回等于DB.labels_的{{1}}元素的索引。这些是cl中使用的数据元素，fit已将其标记为群集DB的一部分。
在这种情况下，cl返回上一个数组中最小的indice。这意味着它将检索已归类为集群np.min的一部分的集合的第一个元素。通过循环遍历所有群集，可以检索一组在群集中构成的图像的示例。

此索引对应于data.image中的索引，因为cl包含数据集中每个点的标签，您需要DB.labels_。此数据集的顺序与DB.fit相同。

DBSCAN in scikit-learn of Python：麻烦理解DBSCAN的结果

1 个答案: