大家好我正在尝试为k-mean Clustering Algo运行群集转储。它没有工作。任何的想法?这是关于psudo模式集群的Mahout in Action的示例。
可视化集群转储输出或K-mean输出的任何工具或方法。
[186946@01HW534064 bin]$ ./mahout clusterdump -dt sequencefile -d /home/186946/reuters-vectors/dictionary.file-0-i reuters-fkmeans-clusters/clusters-3 -o /home/186946/clusters.txt -b 10 -n 10
Running on hadoop, using HADOOP_HOME=/home/186946/hadoop-0.20.2-cdh3u5
No HADOOP_CONF_DIR set, using /home/186946/hadoop-0.20.2-cdh3u5/src/conf
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
13/03/08 17:26:11 ERROR common.AbstractJob: Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
-archives <paths> comma separated archives to be unarchived
on the compute machines.
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-files <paths> comma separated files to be copied to the
map reduce cluster
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-libjars <paths> comma separated jar files to include in
the classpath.
-tokenCacheFile <tokensFile> name of the file with the tokens
Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific
Options:
Usage:
[--seqFileDir <seqFileDir> --output <output> --substring <substring>
--numWords <numWords> --pointsDir <pointsDir> --dictionary <dictionary>
--dictionaryType <dictionaryType> --help --tempDir <tempDir> --startPhase
<startPhase> --endPhase <endPhase>]
Job-Specific Options:
--seqFileDir (-s) seqFileDir The directory containing Sequence
Files for the Clusters
--output (-o) output Optional output directory. Default
is to output to the console.
--substring (-b) substring The number of chars of the
asFormatString() to print
--numWords (-n) numWords The number of top terms to print
--pointsDir (-p) pointsDir The directory containing points
sequence files mapping input vectors
to their cluster. If specified,
then the program will output the
points associated with a cluster
--dictionary (-d) dictionary The dictionary file
--dictionaryType (-dt) dictionaryType The dictionary file type
(text|sequencefile)
--help (-h) Print out help
--tempDir tempDir Intermediate output directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
13/03/08 17:26:11 INFO driver.MahoutDriver: Program took 133 ms
谢谢
答案 0 :(得分:0)
mahout clusterdump \
-d output/vectors/dictionary.file-0 \
-dt sequencefile \
-i output/clusters/clusters-2-final/part-00000 \
-n 20 \
-b 100 \
-o cdump.txt \
-p output/clusters/clusteredPoints/
只需在文本编辑器中复制粘贴上面的所有行,将-d
,-dt
,-i
,-p
的参数小心地作为我的参数。
p.s路径来自HDFS。