将Solr与Nutch问题集成在一起

时间:2014-08-05 11:13:31

标签: solr nutch

我正在关注here的教程。我已经分别安装了solr和nutch,他们都工作得很好。当我必须整合它们时,问题就来了。从本网站上的早期帖子中我了解到架构文件可能存在一些问题。正如在tut中所提到的,我将nutch的schema.xml复制到solr的schema.xml并重新启动了solr。 solr因配置问题而停止运行。所以我只是将每个文件的内容与现有内容一起复制到另一个文件中。现在(以前也是)我得到这个错误:

Indexer: starting at 2014-08-05 11:10:21
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication


Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

有人可以建议应该做些什么吗? 我正在使用apache-nutch-1.8和solr-4.9.0  这是我的hadoop.log文件的样子:

2014-08-05 12:50:05,032 INFO  crawl.Injector - Injector: starting at 2014-08-05 12:50:05
2014-08-05 12:50:05,033 INFO  crawl.Injector - Injector: crawlDb: -dir/crawldb
2014-08-05 12:50:05,033 INFO  crawl.Injector - Injector: urlDir: urls
.
.
.
.
.
2014-08-05 13:04:21,255 INFO  solr.SolrIndexWriter - Indexing 1 documents
2014-08-05 13:04:21,286 WARN  mapred.LocalJobRunner - job_local1310160376_0001
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://my-solr-url:8983/solr/update?wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-08-05 13:04:21,544 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

2014-08-05 13:10:37,855 INFO  crawl.Injector - Injector: starting at 2014-08-05 13:10:37
.
.
.

1 个答案:

答案 0 :(得分:1)

可能是由于教程建议复制conf / schema.xml的一些版本差异,而在此特定版本的solr中,应该复制文件schema-solr4.xml,然后添加:{{1}在第351行。重新启动<field name="_version_" type="long" indexed="true" stored="true"/>的solr,它正常工作!希望这有助于某人!