我是mahout环境的新手...... 我得到了以下输出
/opt/hadoop/mahout-distribution-0.9/bin$ mahout clusterdump \
> -d /app/hadoop/dmacs/training_set1_sparseout/dictionary.file-0 \
> -dt sequencefile \
> -i /app/hadoop/dmacs/training_set1_sparseout/kmeans-clusters/clusters-2-final \
> -n 20 \
> -b 100 \
> -o /app/hadoop/dmacs/kmeans_final_output/cdump.txt \
> -dm org.apache.mahout.common.distance.CosineDistanceMeasure
:VL-1480{n=150 c=[1000062,3,2005:0.098, 1000079,1,2002:0.080, 1000079,2,2002:0.078, 1000079,3,2002:0.
Top Terms:
25 => 10.670724073251089
31 => 7.999464999039968
1664010,5,2005 => 1.2396535428365072
2439493,1,2003 => 1.184131249586741
507603,1,2005 => 0.9944797229766845
199257,3,2005 => 0.9928587055206299
2602249,3,2004 => 0.9890585215886434
184705,3,2004 => 0.9728035926818848
447759,5,2005 => 0.9652122163772583
1152594,3,2004 => 0.9619592666625977
104237,5,2005 => 0.9515269517898559
1473980,3,2005 => 0.9478832610448201
2118461,4,2005 => 0.9315701317787171
1037245,3,2005 => 0.9236405754089355
1639792,1,2002 => 0.9183504740397136
1227322,1,2003 => 0.9121313015619914
2019240,3,2004 => 0.909924259185791
1117152,5,2005 => 0.9050878302256267
2040853,3,2004 => 0.9025738382339478
1309838,5,2005 => 0.8964522886276245
最高术语在输出中实际意味着什么。 在此先感谢!!!
答案 0 :(得分:1)
最高级术语是指这些文档的前几个术语,它们是群集的一部分。您可以使用带有-n / -- numWords
命令的clusterdump
标志控制顶级术语输出。
有关标志的详细信息,请参阅帮助:
mahout-distribution-0.9$ bin/mahout clusterdump -h