Question

我正在使用clValid验证层次聚类的优点。以下是我的代码。聚类总是会产生一个包含70％元素的噪声簇，因此我会在噪声簇中递归聚类元素。

intern <- clValid(primaryDataSource, 2:10,clMethods = c("Hierarchical"),
                  validation="internal", maxitems = 2200)
summary(intern)

摘要输出（实习生）：

Clustering Methods:
 hierarchical 

Cluster sizes:
 2 3 4 5 6 7 8 9 10 

Validation Measures:
                                 2       3       4       5       6       7       8       9      10

hierarchical Connectivity   3.8738  3.8738  8.2563 10.9452 16.0286 18.6452 20.6452 22.6452 24.6452
             Dunn           4.0949  0.8810  0.6569  0.8694  0.8808  1.0416  1.0230  1.0262  1.3724
             Silhouette     0.9592  0.9879  0.9785  0.9751  0.9727  0.9729  0.9727  0.9726  0.9725

Optimal Scores:

             Score  Method       Clusters
Connectivity 3.8738 hierarchical 2       
Dunn         4.0949 hierarchical 2       
Silhouette   0.9879 hierarchical 3

在每次迭代中，我必须执行clValid（）并选择能够给出最高Silhouette值的簇数（在上面的例子中它是3）。我正在尝试自动化递归聚类方法。因此，我希望选择具有最高Silhouette值的簇数。你能帮我提取这条信息吗？谢谢。

P.S：我尝试将结果转换为数据框或表格。然而它没有用。

更新：使用str（）

后

> str(intern)

Formal class 'clValid' [package "clValid"] with 14 slots
  ..@ clusterObjs:List of 1
  .. ..$ hierarchical:List of 7
  .. .. ..$ merge      : int [1:2173, 1:2] -1673 -714 -1121 -1688 -1876 -1123 -1689 -1228 -429 -535 ...
  .. .. ..$ height     : num [1:2173] 0 0.001 0.001 0.001 0.001 ...
  .. .. ..$ order      : int [1:2174] 2165 2166 1950 1951 1954 1955 1577 1565 1564 1576 ...
  .. .. ..$ labels     : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
  .. .. ..$ method     : chr "average"
  .. .. ..$ call       : language hclust(d = Dist, method = method)
  .. .. ..$ dist.method: chr "euclidean"
  .. .. ..- attr(*, "class")= chr "hclust"
  ..@ measures   : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
  .. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
  .. .. ..$ : chr "hierarchical"
  ..@ measNames  : chr [1:3] "Connectivity" "Dunn" "Silhouette"
  ..@ clMethods  : chr "hierarchical"
  ..@ labels     : chr [1:2174] "out_M_aacald_c_boundary" "out_M_12ppd_DASH_R_e_boundary" "out_M_12ppd_DASH_S_e_boundary" "in_M_14glucan_e_boundary" ...
  ..@ nClust     : num [1:9] 2 3 4 5 6 7 8 9 10
  ..@ validation : chr "internal"
  ..@ metric     : chr "euclidean"
  ..@ method     : chr "average"
  ..@ neighbSize : num 10
  ..@ annotation : NULL
  ..@ GOcategory : chr "all"
  ..@ goTermFreq : num 0.05
  ..@ call       : language clValid(obj = primaryDataSource, nClust = 2:10, clMethods = c("Hierarchical"), validation = "internal",      maxitems = 2200)

我想重要的部分是

@ measures   : num [1:3, 1:9, 1] 3.874 4.095 0.959 3.874 0.881 ...
      .. ..- attr(*, "dimnames")=List of 3
      .. .. ..$ : chr [1:3] "Connectivity" "Dunn" "Silhouette"
      .. .. ..$ : chr [1:9] "2" "3" "4" "5" ...
      .. .. ..$ : chr "hierarchical"

当我执行>intern@measures时，我得到了以下结果。

                     2         3         4          5          6          7          8         9
Connectivity 3.8738095 3.8738095 8.2563492 10.9452381 16.0285714 18.6452381 20.6452381 22.645238
Dunn         4.0948837 0.8810494 0.6568857  0.8694067  0.8808228  1.0415614  1.0230197  1.026192
Silhouette   0.9591803 0.9879153 0.9784684  0.9751393  0.9727454  0.9728736  0.9727153  0.972622
                     10
Connectivity 24.6452381
Dunn          1.3724494
Silhouette    0.9725379

我可以根据索引获取最大值并访问单个项目。我想获得Silhouette的最大价值。

intern@measures[1]
max(intern@measures)

Answer 1

一些额外的解释，当@显示S4个符号时，这表明您正在检查的对象是具有属性的clValid类。我对clValid不熟悉，但快速查看source code表明S4类继承自object@attribute。

您可以使用print访问这些内容。通常，这些属性可以是任何东西

查看clValid的{{1}} function，您似乎可以使用便捷功能measures(object)访问这些指标。查看clValid的剩余源代码，utility functions可能对您有用。检查optimalScores()。

R：从clValid对象

1 个答案: