Nutch 1.8和Apache Solr 4.8集成作业失败

时间:2014-07-01 21:17:15

标签: apache solr nutch

我正在尝试使用Windows 7上的Nutch 1.8和Solr 4.8抓取网页。

bin/crawl urls newsolr http://localhost:8983/solr/ 1 -depth 1

我一直收到以下错误

Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

以下是日志文件的一部分:

2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: content dest: content
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: title dest: title
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: host dest: host
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: segment dest: segment
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: boost dest: boost
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: digest dest: digest
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: url dest: id
2014-07-01 16:58:33,613 INFO  solr.SolrMappingReader - source: url dest: url
2014-07-01 16:58:33,643 INFO  solr.SolrIndexWriter - Indexing 1 documents
2014-07-01 16:58:33,773 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Method Not Allowed

Method Not Allowed

request: http://localhost:8983/solr/
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2014-07-01 16:58:34,628 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)`

最后,Solr的错误日志:

`org.apache.solr.common.SolrException: ERROR: [doc=http://.com/] unknown field 'tstamp' `

这是我的第一个solr / nutch设置。任何帮助是极大的赞赏。先谢谢!

1 个答案:

答案 0 :(得分:0)

停止solr实例并重新启动它。它应该解决你的问题。 发生错误是因为您对模式文件进行了更改,并且没有重新启动solr以便保存更改,因此solr无法"参见"新添加的字段。