当我在cygwin中输入以下命令时:
bin/nutch index crawl/crawldb crawl/linkdb crawl/segment/*
然后二进制工作正常。当我将完全相同的行放入我的bash脚本时:
#!/bin/bash/
bin/nutch index crawl/crawldb crawl/linkdb crawl/segment/*
我收到错误消息,说某些文件不存在。这可能是Nutch特有的,这是我正在运行的程序,但我认为它更多地与我在脚本中调用命令的方式有关。关于什么是错的以及如何解决这个问题的想法? (是的,我正在使用标签完成)
编辑:
脚本:
#!/bin/bash
/home/Dan/apache-nutch-1.2/bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
我运行命令:
$ pwd
/home/Dan/apache-nutch-1.2
$ ./nutch.sh
我得到的输出是:
Indexer: starting at 2010-11-29 15:15:44
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/crawl_fetch
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/crawl_parse
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/parse_data
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/parse_text
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
此致 〜DS
答案 0 :(得分:1)
两件事:
#!/bin/bash
。另请仔细检查bash
中是/bin
。bin
目录执行nutch。所以,如果你在$HOME
,并假设你有一条路径$HOME/bin/nutch
,那么你会没事的。但是如果你改为/tmp
,那么它就会失败,因为没有/tmp/bin/nutch
这样的路径。你最好先把完整的绝对路径名给nutch。