Nutch没有在linux环境下工作

时间:2014-03-11 12:32:59

标签: linux nutch

我在Linux系统上安装了Nutch。当我进入目录'bin'并运行./nutch时,它会显示以下内容 -

Usage: nutch COMMAND
where COMMAND is one of:
  crawl             one-step crawler for intranets (DEPRECATED - USE CRAWL SCRIPT INSTEAD)
  readdb            read / dump crawl db
  mergedb           merge crawldb-s, with optional filtering
  readlinkdb        read / dump link db
  inject            inject new urls into the database
  generate          generate new segments to fetch from crawl db
  freegen           generate new segments to fetch from text files
  fetch             fetch a segment's pages
  parse             parse a segment's pages
  readseg           read / dump segment data
  mergesegs         merge several segments, with optional filtering and slicing
  updatedb          update crawl db from segments after fetching
  invertlinks       create a linkdb from parsed segments
  mergelinkdb       merge linkdb-s, with optional filtering
  index             run the plugin-based indexer on parsed segments and linkdb
  solrindex         run the solr indexer on parsed segments and linkdb
  solrdedup         remove duplicates from solr
  solrclean         remove HTTP 301 and 404 documents from solr
  clean             remove HTTP 301 and 404 documents from indexing backends configured via plugins
  parsechecker      check the parser for a given url
  indexchecker      check the indexing filters for a given url
  domainstats       calculate domain statistics from crawldb
  webgraph          generate a web graph from existing segments
  linkrank          run a link analysis program on the generated web graph
  scoreupdater      updates the crawldb with linkrank scores
  nodedumper        dumps the web graph's node scores
  plugin            load a plugin and run one of its classes main()
  junit             runs the given JUnit test
 or
  CLASSNAME         run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

以上输出显示Nutch已正确安装在此系统上。但接下来当我运行./nutch crawlresult urls -dir crawl -depth 3时,我得到以下输出 -

./nutch: line 272: /usr/java/jdk1.7.0/bin/java: Success

而我期待nutch开始爬行并显示日志。请告诉我有什么问题?

1 个答案:

答案 0 :(得分:0)

您无法运行此命令" crawlresult"。您必须使用nutch脚本命令列表中的命令。如果你用nutch爬行,你可以使用这个tutorials