Mahout LDA给出了FileNotFound异常

时间:2011-09-05 14:45:11

标签: hadoop mahout lda

我按照here这样创建了我的术语向量:

~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output /home/ben/Scripts/eipi/mahout_out -chunk 1
~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs -wt tf -seq

然后我跑

~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working -k 2 -v 100

我得到了:

  

MAHOUT-JOB:/home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar   11/09/04 16:28:59 INFO common.AbstractJob:命令行参数:{ - endPhase = 2147483647, - input = / home / ben / Scripts / eipi / termvecs, - maxIter = -1, - numTopics = 2, - numum = 100, - output = / home / ben / Scripts / eipi / lda_working, - startPhase = 0, - tempDir = temp, - topicSmoothing = -1.0}   11/09/04 16:29:00 INFO lda.LDADriver:LDA Iteration 1   11/09/04 16:29:01 INFO input.FileInputFormat:要处理的总输入路径:4   11/09/04 16:29:01 INFO mapred.JobClient:清理临时区域文件:/tmp/hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001   线程“main”中的异常java.io.FileNotFoundException:文件文件:/ home / ben / Scripts / eipi / termvecs / tokenized-documents / data不存在。       在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)       在org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)       at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)       at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)       在org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902)       在org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919)       在org.apache.hadoop.mapred.JobClient.access $ 500(JobClient.java:170)       在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:838)       在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:791)       at java.security.AccessController.doPrivileged(Native Method)       在javax.security.auth.Subject.doAs(Subject.java:396)       在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)       在org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)       在org.apache.hadoop.mapreduce.Job.submit(Job.java:465)       在org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)       在org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:426)       在org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:226)       在org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:174)       在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)       在org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:90)       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)       在java.lang.reflect.Method.invoke(Method.java:597)       在org.apache.hadoop.util.ProgramDriver $ ProgramDescription.invoke(ProgramDriver.java:68)       在org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)       在org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)       在java.lang.reflect.Method.invoke(Method.java:597)       在org.apache.hadoop.util.RunJar.main(RunJar.java:156)

是的,该文件不存在。我该如何创造呢?

1 个答案:

答案 0 :(得分:0)

矢量可能是空的,因为它们的创建可能存在问题。检查是否在其文件夹中成功创建了向量(文件大小不是0字节)。如果您输入文件夹缺少某些文件可能会发生此错误。在这种情况下,虽然没有创建有效的输出,但这两个步骤仍然有效。