在Windows 7中运行Apache Nutch

时间:2013-10-01 13:49:40

标签: nutch

我正在尝试与Cygwin一起运行Nutch。我在抓取内容时遇到问题

我的评论是

$ bin / nutch crawl urls -dir crawl -depth 3 -topN 5

回复是

** cygpath:无法转换空路径

InjectorJob:使用org.apache.gora.memory.store.MemStore类作为Gora存储类。 线程“main”中的异常java.io.IOException:无法将path:\ tmp \ hadoop-user \ mapred \ staging \ user1249593824.staging的权限设置为0700 **

    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

帮助我。

1 个答案:

答案 0 :(得分:1)

我在2天之前遇到了同样的问题。这是我遵循的解决方案

  1. 下载http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/0.20.2
  2. 将(nutch-directory)/lib/hadoop-core-1.2.0.jar替换为下载的文件,并使用相同的名称重命名。
  3. 就是这样。