Mahout聚类不读取输入

时间:2013-03-08 11:54:53

标签: machine-learning cluster-computing mahout

大家好我正在尝试为k-mean Clustering Algo运行群集转储。它没有工作。任何的想法?这是关于psudo模式集群的Mahout in Action的示例。

可视化集群转储输出或K-mean输出的任何工具或方法。

[186946@01HW534064 bin]$ ./mahout clusterdump -dt sequencefile -d /home/186946/reuters-vectors/dictionary.file-0-i reuters-fkmeans-clusters/clusters-3 -o /home/186946/clusters.txt -b 10 -n 10
Running on hadoop, using HADOOP_HOME=/home/186946/hadoop-0.20.2-cdh3u5
No HADOOP_CONF_DIR set, using /home/186946/hadoop-0.20.2-cdh3u5/src/conf 
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
13/03/08 17:26:11 ERROR common.AbstractJob: Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
 -archives <paths>              comma separated archives to be unarchived
                                on the compute machines.
 -conf <configuration file>     specify an application configuration file
 -D <property=value>            use value for given property
 -files <paths>                 comma separated files to be copied to the
                                map reduce cluster
 -fs <local|namenode:port>      specify a namenode
 -jt <local|jobtracker:port>    specify a job tracker
 -libjars <paths>               comma separated jar files to include in
                                the classpath.
 -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific    
Options:                                                                        
Usage:                                                                          
 [--seqFileDir <seqFileDir> --output <output> --substring <substring>           
--numWords <numWords> --pointsDir <pointsDir> --dictionary <dictionary>         
--dictionaryType <dictionaryType> --help --tempDir <tempDir> --startPhase       
<startPhase> --endPhase <endPhase>]                                             
Job-Specific Options:                                                           
  --seqFileDir (-s) seqFileDir             The directory containing Sequence    
                                           Files for the Clusters               
  --output (-o) output                     Optional output directory. Default   
                                           is to output to the console.         
  --substring (-b) substring               The number of chars of the           
                                           asFormatString() to print            
  --numWords (-n) numWords                 The number of top terms to print     
  --pointsDir (-p) pointsDir               The directory containing points      
                                           sequence files mapping input vectors 
                                           to their cluster.  If specified,     
                                           then the program will output the     
                                           points associated with a cluster     
  --dictionary (-d) dictionary             The dictionary file                  
  --dictionaryType (-dt) dictionaryType    The dictionary file type             
                                           (text|sequencefile)                  
  --help (-h)                              Print out help                       
  --tempDir tempDir                        Intermediate output directory        
  --startPhase startPhase                  First phase to run                   
  --endPhase endPhase                      Last phase to run                    
13/03/08 17:26:11 INFO driver.MahoutDriver: Program took 133 ms

谢谢

1 个答案:

答案 0 :(得分:0)

mahout clusterdump \
-d output/vectors/dictionary.file-0 \
-dt sequencefile \
-i output/clusters/clusters-2-final/part-00000 \
-n 20 \
-b 100 \
-o cdump.txt \
-p output/clusters/clusteredPoints/

只需在文本编辑器中复制粘贴上面的所有行,将-d-dt-i-p的参数小心地作为我的参数。

p.s路径来自HDFS。