我想使用Nutch 1.9和Solr 4.10.2进行网络爬虫 爬行工作正在进行,但是当涉及到索引时会出现问题。我找了问题,我尝试了很多方法,但似乎没什么用。这就是我得到的:
Indexer: starting at 2015-03-13 20:51:08
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
当我看到日志文件时,这就是我得到的:
2015-03-13 20:51:08,768 INFO indexer.IndexingJob - Indexer: starting at 2015-03-13 20:51:08
2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: deleting gone documents: false
2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: URL filtering: false
2015-03-13 20:51:08,846 INFO indexer.IndexingJob - Indexer: URL normalizing: false
2015-03-13 20:51:09,117 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-03-13 20:51:09,117 INFO indexer.IndexingJob - Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
2015-03-13 20:51:09,121 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: testCrawl/crawldb
2015-03-13 20:51:09,122 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: testCrawl/linkdb
2015-03-13 20:51:09,122 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311221258
2015-03-13 20:51:09,234 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311222328
2015-03-13 20:51:09,235 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150311222727
2015-03-13 20:51:09,236 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testCrawl/segments/20150312085908
2015-03-13 20:51:09,282 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-03-13 20:51:09,747 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2015-03-13 20:51:20,904 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: content dest: content
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: title dest: title
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: host dest: host
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: segment dest: segment
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: boost dest: boost
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: digest dest: digest
2015-03-13 20:51:20,929 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2015-03-13 20:51:21,192 INFO solr.SolrIndexWriter - Indexing 250 documents
2015-03-13 20:51:21,192 INFO solr.SolrIndexWriter - Deleting 0 documents
2015-03-13 20:51:21,342 INFO solr.SolrIndexWriter - Indexing 250 documents
2015-03-13 20:51:21,437 WARN mapred.LocalJobRunner - job_local1194740690_0001
org.apache.solr.common.SolrException: Not Found
Not Found
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2015-03-13 20:51:21,607 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
那么请帮忙吗?