Question

我得到了CSV和TEXT格式的结果，如跟随clusterdump。

CSV：

0,Sports_38.txt
1,Sports_23.txt
2,Sports_36.txt
3,Sports_13.txt
4,Sports_31.txt,Sports_32.txt
5,Sports_28.txt,Sports_29.txt
6,Sports_2.txt
9,Sports_15.txt

TEXT：

{"identifier":"VL-1","r":[],"c":[...,"n":7}
Top Terms: 
    什                                       =>  15.829998016357422
    利物浦                                     =>  13.629814147949219
    克                                       =>  11.317766189575195
    格                                       =>  10.938775062561035
    特                                       =>  10.842317581176758
    尔                                       =>  10.447234153747559
    切尔西                                     =>   9.742402076721191
    比赛                                      =>   8.247735023498535
    表现                                      =>   7.909337520599365
    批评                                      =>   7.462332725524902

我注意到CSV文件中只有一个VL-1点，但TEXT文件中只有7个VL-1点（VL-1＆＃39; s＆＃34;等于7）。

为什么有些点会消失？我怎样才能得到每一个积分＆＃39;簇？

非常感谢。

Answer 1

如果数据更大，我也有空的clusteredPoints。

我终于找到了理由。

在Kmeans.run的第8个参数中，clusterClassificationThreshold应为0。（mahout 1.0）

请检查：http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3C50B62629.5020700@windwardsolutions.com%3E

群集结果的clusteredPoints消失[mahout]

1 个答案: