编辑:通过切换到Elasticsearch 5.3.3版来解决。
我已经尝试了一切。我已经尽一切可能修改了index-writers.xml,已经在nutch-site.xml中设置了我的elasticsearch设置,并且已经正确地在nutch-site.xml中设置了我的插件。
这一切过去都适用于Nutch 2.0,我听说过它曾经适用于Nutch 1.0及更早版本。 1.15有什么不同吗?它根本行不通吗?我想念什么吗?
这是我经常得到的错误:
Indexing job did not succeed, job status:FAILED, reason: NA
Indexer: java.lang.RuntimeException: Indexing job did not succeed, job
status:FAILED, reason: NA
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:152)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:235)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:244)
这是我的各种设置:
index-writers.xml:
<writer id="indexer_elastic_1" class="org.apache.nutch.indexwriter.elastic.ElasticIndexWriter">
<parameters>
<param name="host" value="localhost"/>
<param name="port" value="9300"/>
<param name="cluster" value="elasticsearch"/>
<param name="index" value="nutch"/>
<param name="max.bulk.docs" value="250"/>
<param name="max.bulk.size" value="2500500"/>
<param name="exponential.backoff.millis" value="100"/>
<param name="exponential.backoff.retries" value="10"/>
<param name="bulk.close.timeout" value="600"/>
</parameters>
<mapping>
<copy>
<field source="title" dest="title,search"/>
</copy>
<rename />
<remove />
</mapping>
</writer>
<writer id="indexer_elastic_rest_1" class="org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter">
<parameters>
<param name="host" value="localhost"/>
<param name="port" value="9200"/>
<param name="index" value="nutch"/>
<param name="max.bulk.docs" value="250"/>
<param name="max.bulk.size" value="2500500"/>
<param name="type" value="doc"/>
<param name="https" value="false"/>
<param name="trustallhostnames" value="false"/>
<param name="languages" value=""/>
<param name="separator" value="_"/>
<param name="sink" value="others"/>
</parameters>
<mapping>
<copy>
<field source="title" dest="search"/>
</copy>
<rename />
<remove />
</mapping>
</writer>
nutch-site.xml插件:
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-elastic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
和我的nutch-site.xml elasticsearch设置:
<property>
<name>elastic.host</name>
<value></value>
<description>localhost</description>
</property>
<property>
<name>elastic.port</name>
<value>9300</value>
<description>The port to connect to using TransportClient.</description>
</property>
<property>
<name>elastic.cluster</name>
<value>elasticsearch</value>
<description>The cluster name to discover. Either host and port must be defined
or cluster.</description>
</property>
<property>
<name>elastic.index</name>
<value>nutch</value>
<description>Default index to send documents to.</description>
</property>
<property>
<name>elastic.max.bulk.docs</name>
<value>250</value>
<description>Maximum size of the bulk in number of documents.</description>
</property>
<property>
<name>elastic.max.bulk.size</name>
<value>2500500</value>
<description>Maximum size of the bulk in bytes.</description>
</property>
<property>
<name>elastic.exponential.backoff.millis</name>
<value>100</value>
<description>Initial delay for the BulkProcessor's exponential backoff policy.
</description>
</property>
<property>
<name>elastic.exponential.backoff.retries</name>
<value>10</value>
<description>Number of times the BulkProcessor's exponential backoff policy
should retry bulk operations.</description>
</property>