运行solrindexer时出错

时间:2013-02-05 14:01:21

标签: solr nutch

现在我使用3.6.1和nutch 1.5它运行良好...我抓取我的网站并将数据索引到solr并使用solr搜索,但两周前它开始不起作用... 我使用./nutch crawl urls -solr http://localhost:8080/solr/ -depth 5 -topN 100命令它的工作,但我使用./nutch crawl urls -solr http://localhost:8080/solr/ -depth 5 -topN 100000,它的投掷一个例外,在我的日志文件中我发现了这个..

2013-02-05 17:04:20,697 INFO  solr.SolrWriter - Indexing 250 documents
2013-02-05 17:04:20,697 INFO  solr.SolrWriter - Deleting 0 documents
2013-02-05 17:04:21,275 WARN  mapred.LocalJobRunner - job_local_0029
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: `http://localhost:8080/solr/update?wt=javabin&version=2`
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-05 17:04:21,883 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-05 17:04:21,887 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-05 17:04:21
2013-02-05 17:04:21,887 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`    

两周前它运作良好...... 有没有人有类似的问题?

嗨,我刚刚完成抓取并遇到同样的异常,但是当我查看我的log / hadoop.log文件时,我发现了这个..

    2013-02-06 22:02:14,111 INFO  solr.SolrWriter - Indexing 250 documents
2013-02-06 22:02:14,111 INFO  solr.SolrWriter - Deleting 0 documents
2013-02-06 22:02:14,902 WARN  mapred.LocalJobRunner - job_local_0019
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: `http://localhost:8080/solr/update?wt=javabin&version=2`
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:457)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:497)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:304)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2013-02-06 22:02:15,027 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
2013-02-06 22:02:15,032 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-02-06 22:02:15
2013-02-06 22:02:15,032 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: `http://localhost:8080/solr/`
2013-02-06 22:02:21,281 WARN  mapred.FileOutputCommitter - Output path is null in cleanup
2013-02-06 22:02:22,263 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: finished at 2013-02-06 22:02:22, elapsed: 00:00:07
2013-02-06 22:02:22,263 INFO  crawl.Crawl - crawl finished: crawl-20130206205733 

我希望这有助于理解这个问题......

1 个答案:

答案 0 :(得分:0)

根据您显示的日志,我认为答案将在Solr方面。您应该有一个异常跟踪,它将告诉您哪个组件停止了处理。如果它在两周前发生了变化,或者有什么东西发生了变化(jar版本?),或者你有一个特定的文档是个问题。

如果问题发生在单个文档(尝试几个不同的文档)而不是您可能有一些环境(jar,属性等)更改。如果一个文档子集没有发生但是与另一个文档子集发生,则特定文档可能存在问题(例如编码错误)。

同样,Solr端堆栈跟踪将是第一件要检查的事情。