如何解释weka上EM的输出

时间:2017-08-31 09:00:09

标签: machine-learning weka data-mining

我尝试使用WEKA中的默认参数对数据运行EM算法,但我无法理解如何解释它?

    === Run information ===

Scheme:       weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:     Chronic_Kidney_Disease-weka.filters.unsupervised.attribute.Remove-R12-weka.filters.unsupervised.attribute.Remove-R3-weka.filters.unsupervised.attribute.Remove-R3-4-weka.filters.unsupervised.attribute.Remove-R5-10,12-20
Instances:    800
Attributes:   6
              age
              bp
              rbc
              pc
              hemo
              class
Test mode:    evaluate on training data


=== Clustering model (full training set) ===


EM
==

Number of clusters selected by cross validation: 6
Number of iterations performed: 100


              Cluster
Attribute           0        1        2        3        4        5
               (0.29)   (0.22)   (0.38)   (0.02)   (0.04)   (0.05)
===================================================================
age
  mean         53.5869  65.0962    46.44  51.3652  56.1297   10.939
  std. dev.    12.4505   7.9718   15.546   3.7759  10.2604   6.7004

bp
  mean         77.3114     79.7  71.4394  115.138  92.1235  66.5196
  std. dev.    11.7858  12.1008   8.4722  31.4278   5.8351  10.0583

rbc
  normal      185.8341 165.6585 306.8285  14.0588   7.3129  32.3071
  abnormal     45.4643  13.3988   1.0652   3.3197  29.7885   6.9635
  [total]     231.2984 179.0574 307.8937  17.3785  37.1015  39.2706
pc
  normal       152.713 147.8797 306.8886  13.0467   1.9999  31.4721
  abnormal     78.5854  31.1776    1.005   4.3319  35.1016   7.7985
  [total]     231.2984 179.0574 307.8937  17.3785  37.1015  39.2706
hemo
  mean         10.6591  11.7665  15.0745   9.5796   8.1499  12.0494
  std. dev.     2.1313   1.1677   1.3496   2.5159   2.1512   1.5108

class
  ckd         230.1835  177.972   7.2109  16.3651  36.1014   38.167
  notckd        1.1149   1.0853 300.6828   1.0134        1   1.1036
  [total]     231.2984 179.0574 307.8937  17.3785  37.1015  39.2706


Time taken to build model (full training data) : 13.21 seconds

=== Model and evaluation on training set ===

Clustered Instances

0      218 ( 27%)
1      196 ( 25%)
2      302 ( 38%)
3       12 (  2%)
4       34 (  4%)
5       38 (  5%)


Log likelihood: -11.18988

请帮助理解输出。

提前致谢

1 个答案:

答案 0 :(得分:0)

它给了你六个集群,分别有27%,25%,38%,2%,4%和5%的数据。 (这是> 100%,所以是四舍五入的。)

经过交叉验证后到达6点(对某些人进行了培训,对其他人进行了多次测试)。

给出每个集群中项目的每个属性的均值和标准差。

对数可能性是衡量集群有多好的指标 - 培训试图将其最小化。它用于比较哪些可能的集群更好,并且本身并不重要。