我已经配置了Nutch和Solr,它们已经启动。我正在使用Solr为Nutch爬行的文档建立索引。但是,两者之间的通信(linkdb命令)失败。我发现了类似的线程,但是没有一个解决方案对我有用。类似线程(Nutch job failing when sending data to Solr)
我通过以下信息设置了配置文件:https://www.cs.toronto.edu/~muuo/blog/build-yourself-a-mini-search-engine/
版本: Nutch 1.14(https://archive.apache.org/dist/nutch/1.14/apache-nutch-1.14-bin.tar.gz) Solr 6.6(http://mirror.dsrg.utoronto.ca/apache/lucene/solr/6.6.5/solr-6.6.5.tgz)
我已经尝试使用Nutch Wiki中给出的https://github.com/apache/nutch/blob/master/conf/schema.xml中最近的schema.xml文件。
我从以下代码开始
nutch/bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch -s nutch/urls/ Crawl 2
以
中断/home/sk/SearchEngine/nutch/bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch Crawl/crawldb -linkdb Crawl/linkdb Crawl/segments/20190112160715
Failed with exit value 255.
错误:
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance
solr.zookeeper.hosts : URL of the Zookeeper quorum
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication
Indexing 87/87 documents
Deleting 0 documents
Indexing 87/87 documents
Deleting 0 documents
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
Error running:
/home/sk/SearchEngine/nutch/bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch Crawl/crawldb -linkdb Crawl/linkdb Crawl/segments/20190112160715
Failed with exit value 255.