正如你可以看到底部结果我有两个不同的簇使用不同的种子。我想从两个集群中选择最好的集群。
我知道最小平方误差越大越好。但是,尽管我使用了不同的种子,但它显示了相同的方差。我想知道为什么它显示类似的方差。我还想知道在选择最佳集群时需要考虑的其他事项。
*******************************************************************
kMeans
======
Number of iterations: 10
Within cluster sum of squared errors: 527.6988818392938
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(4898) (2781) (2117)
=====================================================
fixedacidity 6.8548 6.9565 6.7212
volatileacidity 0.2782 0.2826 0.2725
citricacid 0.3342 0.3389 0.3279
residualsugar 6.3914 8.2678 3.9265
chlorides 0.0458 0.0521 0.0374
freesulfurdioxide 35.3081 38.6897 30.8658
totalsulfurdioxide 138.3607 155.2585 116.1627
density 0.994 0.9958 0.9916
pH 3.1883 3.1691 3.2134
sulphates 0.4898 0.492 0.4871
alcohol 10.5143 9.6325 11.6726
quality 5.8779 5.4779 6.4034
Time taken to build model (full training data) : 0.19 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 2781 ( 57%)
1 2117 ( 43%)
***********************************************************************
kMeans
======
Number of iterations: 7
Within cluster sum of squared errors: 527.6993178146143
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(4898) (2122) (2776)
=====================================================
fixedacidity 6.8548 6.7208 6.9572
volatileacidity 0.2782 0.2723 0.2828
citricacid 0.3342 0.3281 0.3389
residualsugar 6.3914 3.9451 8.2614
chlorides 0.0458 0.0374 0.0522
freesulfurdioxide 35.3081 30.9105 38.6697
totalsulfurdioxide 138.3607 116.2175 155.2871
density 0.994 0.9917 0.9958
pH 3.1883 3.2137 3.1689
sulphates 0.4898 0.4876 0.4916
alcohol 10.5143 11.6695 9.6312
quality 5.8779 6.4043 5.4755
Time taken to build model (full training data) : 0.15 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 2122 ( 43%)
1 2776 ( 57%)
答案 0 :(得分:1)
定义"最佳结果"。
任何其他因k-means而更糟糕 - 但这并不意味着不同的质量标准(或聚类算法)对您的实际问题更有帮助
。答案 1 :(得分:0)
使用不同的种子并不能保证结果中的不同聚类。