聚类后​​解释来自Weka的输出

时间:2019-05-04 15:14:12

标签: machine-learning cluster-analysis weka data-mining

在数据挖掘工作中,我被问到:“将k-means聚类应用于原始数据集。将所需聚类的数量设置为已知的类数。在每个实例中为聚类输出。讨论这种分类与可用的真实性之间的区别。“

将具有已知数量属性的数据集聚类的输出如下所示:

    kMeans
======

Number of iterations: 8
Within cluster sum of squared errors: 62.4309244109214

Initial starting points (random):

Cluster 0: 4,31,2,1,3
Cluster 1: 5,52,4,3,3
Cluster 2: 5,33,2,4,3
Cluster 3: 3,65,4,5,3
Cluster 4: 4,56,1,1,3
Cluster 5: 5,60,4,4,3

Missing values globally replaced with mean/mode

Final cluster centroids:
                         Cluster#
Attribute    Full Data          0          1          2          3          4          5
               (961.0)    (173.0)    (143.0)    (110.0)    (126.0)    (186.0)    (223.0)
========================================================================================
BI-RADS         4.3483     3.9595     4.8486     4.2364     4.7222     3.9247     4.5262
Age            55.4874    48.0867    60.3984    56.3364    61.8372    47.4462    60.7802
Shape           2.7215     2.1313     3.6371     1.7267     3.8858          1      3.861
Margin          2.7963     1.0289     2.7784     3.6537          5     1.0108          4
Density         2.9107     2.8457     2.9117     2.8929     2.9661     2.9004     2.9467




Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0      173 ( 18%)
1      143 ( 15%)
2      110 ( 11%)
3      126 ( 13%)
4      186 ( 19%)
5      223 ( 23%)


Class attribute: class
Classes to Clusters:

   0   1   2   3   4   5  <-- assigned to cluster
 154  50  70  20 164  58 | 0
  19  93  40 106  22 165 | 1

Cluster 0 <-- No class
Cluster 1 <-- No class
Cluster 2 <-- No class
Cluster 3 <-- No class
Cluster 4 <-- 0
Cluster 5 <-- 1

Incorrectly clustered instances :   632.0    65.7648 %

我不确定地面真理是什么意思,对于分类和可用地面真理之间的显着差异,可以做出什么评论。

任何输入表示赞赏。

谢谢。

0 个答案:

没有答案