在mahout上运行svd

时间:2014-03-20 18:58:11

标签: mahout

我正在使用命令在mahout上运行svd应用程序  / usr / local / mahout / bin / mahout svd -i / user / hduser / reuters-vectors / tfidf-vectors -o svd_output -nr 41702 -nc 20863 -r 10000 -sym" false" -wd temp_svd --cleansvd" true" -mem" false"

然而我收到错误:

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout/examples/target/mahout-examples-0.6-job.jar
14/03/20 14:51:27 INFO common.AbstractJob: Command line arguments: {--cleansvd=true, --endPhase=2147483647, --inMemory=false, --input=/user/hduser/reuters-vectors/tfidf-vectors, --maxError=0.05, --minEigenvalue=0.0, --numCols=20863, --numRows=41702, --output=svd_output, --rank=10000, --startPhase=0, --symmetric=false, --tempDir=temp, --workingDir=temp_svd}
14/03/20 14:51:28 WARN decomposer.HdfsBackedLanczosState: temp_svd/projections exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/norms exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/scaleFactor exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/projections exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/norms exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/scaleFactor exists, will overwrite
14/03/20 14:51:29 INFO lanczos.LanczosSolver: Finding 10000 singular vectors of matrix with 41702 rows, via Lanczos
14/03/20 14:51:30 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/20 14:51:30 INFO mapred.JobClient: Running job: job_201403201104_0045
14/03/20 14:51:31 INFO mapred.JobClient:  map 0% reduce 0%
14/03/20 14:51:43 INFO mapred.JobClient:  map 100% reduce 0%
14/03/20 14:51:55 INFO mapred.JobClient:  map 100% reduce 50%
14/03/20 14:51:58 INFO mapred.JobClient:  map 100% reduce 100%
14/03/20 14:52:00 INFO mapred.JobClient: Job complete: job_201403201104_0045
14/03/20 14:52:00 INFO mapred.JobClient: Counters: 18
14/03/20 14:52:00 INFO mapred.JobClient:   Job Counters 
14/03/20 14:52:00 INFO mapred.JobClient:     Launched reduce tasks=2
14/03/20 14:52:00 INFO mapred.JobClient:     Launched map tasks=1
14/03/20 14:52:00 INFO mapred.JobClient:     Data-local map tasks=1
14/03/20 14:52:00 INFO mapred.JobClient:   FileSystemCounters
14/03/20 14:52:00 INFO mapred.JobClient:     FILE_BYTES_READ=12
14/03/20 14:52:00 INFO mapred.JobClient:     HDFS_BYTES_READ=167104
14/03/20 14:52:00 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=80
14/03/20 14:52:00 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=196
14/03/20 14:52:00 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce input groups=0
14/03/20 14:52:00 INFO mapred.JobClient:     Combine output records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map input records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce output records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Spilled Records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map output bytes=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map input bytes=0
14/03/20 14:52:00 INFO mapred.JobClient:     Combine input records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map output records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce input records=0
Exception in thread "main" java.util.NoSuchElementException
    at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
    at org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
    at org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
    at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:200)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:152)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:111)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:283)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:289)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

请告诉我如何解决这个问题

2 个答案:

答案 0 :(得分:0)

哪个版本的mahout工作了? 另请注意,您必须使用ssvd来实现您的目标。见http://mahout.apache.org/users/dim-reduction/ssvd.html

答案 1 :(得分:0)

是否已将矢量文件保存在hdfs中。并提到道路正确。如果你在本地运行,那么你应该通过export HADOOP_LOCAL =“TRUE”设置并重新运行它。