Nutch Crawl2.0错误 - java.io.IOException:在作业中没有指定输入路径

时间:2012-08-07 12:17:20

标签: nutch web-crawler

我尝试使用nutch 2.0抓取一些网址,但失败如下:

org.apache.nutch.crawl.Crawler urls -dir crawls -depth 5 -topN 100 线程" main"中的例外情况java.io.IOException:作业中没有指定输入路径     at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:193)     at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)     在org.apache.gora.mapreduce.GoraMapReduceUtils.getSplits(GoraMapReduceUtils.java:67)     在org.apache.gora.store.impl.FileBackedDataStoreBase.getPartitions(FileBackedDataStoreBase.java:148)     在org.apache.gora.mapreduce.GoraInputFormat.getSplits(GoraInputFormat.java:93)     在org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)     在org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)     在org.apache.hadoop.mapred.JobClient.access $ 600(JobClient.java:174)     在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:897)     在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:850)     at java.security.AccessController.doPrivileged(Native Method)     在javax.security.auth.Subject.doAs(Subject.java:396)     在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)     在org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)     在org.apache.hadoop.mapreduce.Job.submit(Job.java:500)     在org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)     在org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:43)     在org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:180)     在org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)     在org.apache.nutch.crawl.Crawler.run(Crawler.java:152)     在org.apache.nutch.crawl.Crawler.run(Crawler.java:250)     在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)     在org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

谁能帮助我吗?非常感谢!

0 个答案:

没有答案