nutch 1.5.1 solrindex java.io.IOException:作业失败

时间:2012-11-01 12:59:58

标签: solr import nutch

当我尝试在我的sles 11系统上使用以下命令时:

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

我收到此错误: java.io.IOException:作业失败!

我正在使用Nutch 1.5.1和Solr 1.6.0。

我能找到的唯一日志是hadoop.log,它显示了以下内容:

2012-11-01 13:42:38,375 INFO  solr.SolrIndexer - SolrIndexer: starting at 2012-11-01 13:42:38
2012-11-01 13:42:38,915 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
2012-11-01 13:42:38,915 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
2012-11-01 13:42:38,915 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20121101124801
2012-11-01 13:42:39,558 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20121101125604
2012-11-01 13:42:39,599 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20121101130601
2012-11-01 13:42:40,083 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-11-01 13:42:43,811 INFO  plugin.PluginRepository - Plugins: looking in: /srv/apache-nutch-1.5.1/plugins
2012-11-01 13:42:44,760 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2012-11-01 13:42:44,760 INFO  plugin.PluginRepository - Registered Plugins:
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         the nutch core extension points (nutch-extensionpoints)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Basic URL Normalizer (urlnormalizer-basic)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Html Parse Plug-in (parse-html)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Basic Indexing Filter (index-basic)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         HTTP Framework (lib-http)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Pass-through URL Normalizer (urlnormalizer-pass)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Regex URL Filter (urlfilter-regex)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Http Protocol Plug-in (protocol-http)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Regex URL Normalizer (urlnormalizer-regex)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Tika Parser Plug-in (parse-tika)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         OPIC Scoring Plug-in (scoring-opic)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         CyberNeko HTML Parser (lib-nekohtml)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Anchor Indexing Filter (index-anchor)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Regex URL Filter Framework (lib-regex-filter)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository - Registered Extension-Points:
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Nutch Protocol (org.apache.nutch.protocol.Protocol)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Nutch URL Filter (org.apache.nutch.net.URLFilter)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2012-11-01 13:42:44,770 INFO  plugin.PluginRepository -         Nutch Content Parser (org.apache.nutch.parse.Parser)
2012-11-01 13:42:44,771 INFO  plugin.PluginRepository -         Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2012-11-01 13:42:44,815 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:42:44,822 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:42:44,822 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:42:54,725 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:42:54,725 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:42:54,725 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:43:03,827 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:03,827 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:03,827 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:43:12,518 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:12,518 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:12,518 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:43:24,757 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:24,758 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:24,758 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:43:34,697 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:34,698 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:34,698 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:43:44,882 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:44,882 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:44,882 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:43:50,458 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:50,458 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:50,458 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter

2012-11-01 13:43:59,148 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:43:59,148 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:43:59,148 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:04,299 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:44:04,299 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:44:04,299 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:11,093 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:44:11,093 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:44:11,093 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:19,633 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:44:19,633 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:44:19,633 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:30,885 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:44:30,885 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:44:30,885 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:39,637 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:44:39,637 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:44:39,637 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:47,905 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-11-01 13:44:47,906 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-11-01 13:44:47,906 INFO  indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-11-01 13:44:48,104 INFO  solr.SolrMappingReader - source: content dest: content
2012-11-01 13:44:48,105 INFO  solr.SolrMappingReader - source: title dest: title
2012-11-01 13:44:48,105 INFO  solr.SolrMappingReader - source: host dest: host
2012-11-01 13:44:48,105 INFO  solr.SolrMappingReader - source: segment dest: segment
2012-11-01 13:44:48,106 INFO  solr.SolrMappingReader - source: boost dest: boost
2012-11-01 13:44:48,106 INFO  solr.SolrMappingReader - source: digest dest: digest
2012-11-01 13:44:48,106 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2012-11-01 13:44:48,106 INFO  solr.SolrMappingReader - source: url dest: id
2012-11-01 13:44:48,107 INFO  solr.SolrMappingReader - source: url dest: url
2012-11-01 13:44:48,398 INFO  solr.SolrWriter - Indexing 11 documents
2012-11-01 13:44:49,082 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Severe errors in solr configuration.  Check your log files for more detailed information on what may be wrong.  If you want solr to continue after configuration errors, change:    <abortOnConfigurationError>false</abortOnConfigurationError>  in solr.xml  ------------------------------------------------------------- org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points         at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)  at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:123)      at org.apache.solr.core.CoreContainer.create(CoreContainer.java:481)    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335)      at org.apache.solr.core.CoreContainer.load(CoreContainer.java:219)      at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)    at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)         at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)         at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)     at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)         at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)    at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)       at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)      at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)      at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)    at org.mortbay.jetty.Server.doSt

Severe errors in solr configuration.  Check your log files for more detailed information on what may be wrong.  If you want solr to continue after configuration errors, change:    <abortOnConfigurationError>false</abortOnConfigurationError>  in solr.xml  ------------------------------------------------------------- org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:688)  at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:123)      at org.apache.solr.core.CoreContainer.create(CoreContainer.java:481)    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335)      at org.apache.solr.core.CoreContainer.load(CoreContainer.java:219)      at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)    at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)         at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)         at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)     at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)         at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)    at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)       at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)      at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)      at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)     at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)    at org.mortbay.jetty.Server.doSt

request: http://localhost:8983/solr/update?wt=javabin&version=2
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:142)
        at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
        at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:466)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2012-11-01 13:45:01,864 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

因为它说我检查了solr日志,但我找不到一个:/ 有什么想法吗?

迎接

2 个答案:

答案 0 :(得分:0)

  1. 检查Nutch和SOLR端的架构是否相同
  2. 更改为schema.xml(第二个“。”导致问题 - 1.5.1的错误)
  3. 如果错误仍然存​​在,请检查下载并使用SOLR 3.4.0,因为Nutch 1.5.1使用SOLR 3.4.0客户端jar。

答案 1 :(得分:0)

如果向下滚动,可以看到它抱怨schema.xml错误。您可以发布schema.xml文件,我将查找问题。我来到这里寻找hadoop原生平台错误的解决方案。