我尝试使用WEKA中的默认参数对数据运行EM算法,但我无法理解如何解释它?
=== Run information === Scheme: weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100 Relation: Chronic_Kidney_Disease-weka.filters.unsupervised.attribute.Remove-R12-weka.filters.unsupervised.attribute.Remove-R3-weka.filters.unsupervised.attribute.Remove-R3-4-weka.filters.unsupervised.attribute.Remove-R5-10,12-20 Instances: 800 Attributes: 6 age bp rbc pc hemo class Test mode: evaluate on training data === Clustering model (full training set) === EM == Number of clusters selected by cross validation: 6 Number of iterations performed: 100 Cluster Attribute 0 1 2 3 4 5 (0.29) (0.22) (0.38) (0.02) (0.04) (0.05) =================================================================== age mean 53.5869 65.0962 46.44 51.3652 56.1297 10.939 std. dev. 12.4505 7.9718 15.546 3.7759 10.2604 6.7004 bp mean 77.3114 79.7 71.4394 115.138 92.1235 66.5196 std. dev. 11.7858 12.1008 8.4722 31.4278 5.8351 10.0583 rbc normal 185.8341 165.6585 306.8285 14.0588 7.3129 32.3071 abnormal 45.4643 13.3988 1.0652 3.3197 29.7885 6.9635 [total] 231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 pc normal 152.713 147.8797 306.8886 13.0467 1.9999 31.4721 abnormal 78.5854 31.1776 1.005 4.3319 35.1016 7.7985 [total] 231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 hemo mean 10.6591 11.7665 15.0745 9.5796 8.1499 12.0494 std. dev. 2.1313 1.1677 1.3496 2.5159 2.1512 1.5108 class ckd 230.1835 177.972 7.2109 16.3651 36.1014 38.167 notckd 1.1149 1.0853 300.6828 1.0134 1 1.1036 [total] 231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 Time taken to build model (full training data) : 13.21 seconds === Model and evaluation on training set === Clustered Instances 0 218 ( 27%) 1 196 ( 25%) 2 302 ( 38%) 3 12 ( 2%) 4 34 ( 4%) 5 38 ( 5%) Log likelihood: -11.18988
请帮助理解输出。
提前致谢
答案 0 :(得分:0)
它给了你六个集群,分别有27%,25%,38%,2%,4%和5%的数据。 (这是> 100%,所以是四舍五入的。)
经过交叉验证后到达6点(对某些人进行了培训,对其他人进行了多次测试)。
给出每个集群中项目的每个属性的均值和标准差。
对数可能性是衡量集群有多好的指标 - 培训试图将其最小化。它用于比较哪些可能的集群更好,并且本身并不重要。