我在Linux系统上安装了Nutch。当我进入目录'bin'并运行./nutch
时,它会显示以下内容 -
Usage: nutch COMMAND
where COMMAND is one of:
crawl one-step crawler for intranets (DEPRECATED - USE CRAWL SCRIPT INSTEAD)
readdb read / dump crawl db
mergedb merge crawldb-s, with optional filtering
readlinkdb read / dump link db
inject inject new urls into the database
generate generate new segments to fetch from crawl db
freegen generate new segments to fetch from text files
fetch fetch a segment's pages
parse parse a segment's pages
readseg read / dump segment data
mergesegs merge several segments, with optional filtering and slicing
updatedb update crawl db from segments after fetching
invertlinks create a linkdb from parsed segments
mergelinkdb merge linkdb-s, with optional filtering
index run the plugin-based indexer on parsed segments and linkdb
solrindex run the solr indexer on parsed segments and linkdb
solrdedup remove duplicates from solr
solrclean remove HTTP 301 and 404 documents from solr
clean remove HTTP 301 and 404 documents from indexing backends configured via plugins
parsechecker check the parser for a given url
indexchecker check the indexing filters for a given url
domainstats calculate domain statistics from crawldb
webgraph generate a web graph from existing segments
linkrank run a link analysis program on the generated web graph
scoreupdater updates the crawldb with linkrank scores
nodedumper dumps the web graph's node scores
plugin load a plugin and run one of its classes main()
junit runs the given JUnit test
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
以上输出显示Nutch已正确安装在此系统上。但接下来当我运行./nutch crawlresult urls -dir crawl -depth 3
时,我得到以下输出 -
./nutch: line 272: /usr/java/jdk1.7.0/bin/java: Success
而我期待nutch开始爬行并显示日志。请告诉我有什么问题?
答案 0 :(得分:0)
您无法运行此命令" crawlresult"。您必须使用nutch脚本命令列表中的命令。如果你用nutch爬行,你可以使用这个tutorials