Nutch 1.15根本不会索引到Elasticsearch

时间:2018-12-24 20:48:55

标签: apache elasticsearch nutch

编辑:通过切换到Elasticsearch 5.3.3版来解决。

我已经尝试了一切。我已经尽一切可能修改了index-writers.xml,已经在nutch-site.xml中设置了我的elasticsearch设置,并且已经正确地在nutch-site.xml中设置了我的插件。

这一切过去都适用于Nutch 2.0,我听说过它曾经适用于Nutch 1.0及更早版本。 1.15有什么不同吗?它根本行不通吗?我想念什么吗?

这是我经常得到的错误:

Indexing job did not succeed, job status:FAILED, reason: NA
Indexer: java.lang.RuntimeException: Indexing job did not succeed, job 
status:FAILED, reason: NA
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:152)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:235)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:244)

这是我的各种设置:

index-writers.xml:

<writer id="indexer_elastic_1" class="org.apache.nutch.indexwriter.elastic.ElasticIndexWriter">
    <parameters>
      <param name="host" value="localhost"/>
      <param name="port" value="9300"/>
      <param name="cluster" value="elasticsearch"/>
      <param name="index" value="nutch"/>
      <param name="max.bulk.docs" value="250"/>
      <param name="max.bulk.size" value="2500500"/>
      <param name="exponential.backoff.millis" value="100"/>
      <param name="exponential.backoff.retries" value="10"/>
      <param name="bulk.close.timeout" value="600"/>
    </parameters>
    <mapping>
      <copy>
        <field source="title" dest="title,search"/>
      </copy>
      <rename />
      <remove />
    </mapping>
  </writer>
  <writer id="indexer_elastic_rest_1" class="org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter">
    <parameters>
      <param name="host" value="localhost"/>
      <param name="port" value="9200"/>
      <param name="index" value="nutch"/>
      <param name="max.bulk.docs" value="250"/>
      <param name="max.bulk.size" value="2500500"/>
      <param name="type" value="doc"/>
      <param name="https" value="false"/>
      <param name="trustallhostnames" value="false"/>
      <param name="languages" value=""/>
      <param name="separator" value="_"/>
      <param name="sink" value="others"/>
    </parameters>
    <mapping>
      <copy>
        <field source="title" dest="search"/>
      </copy>
      <rename />
      <remove />
    </mapping>
  </writer>

nutch-site.xml插件:

<property>
    <name>plugin.includes</name>
    <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-elastic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

和我的nutch-site.xml elasticsearch设置:

    <property>
  <name>elastic.host</name>
  <value></value>
  <description>localhost</description>
</property>

<property> 
  <name>elastic.port</name>
  <value>9300</value>
  <description>The port to connect to using TransportClient.</description>
</property>

<property> 
  <name>elastic.cluster</name>
  <value>elasticsearch</value>
  <description>The cluster name to discover. Either host and port must be defined
  or cluster.</description>
</property>

<property> 
  <name>elastic.index</name>
  <value>nutch</value> 
  <description>Default index to send documents to.</description>
</property>

<property> 
  <name>elastic.max.bulk.docs</name>
  <value>250</value> 
  <description>Maximum size of the bulk in number of documents.</description>
</property>

<property> 
  <name>elastic.max.bulk.size</name>
  <value>2500500</value> 
  <description>Maximum size of the bulk in bytes.</description>
</property>

<property>
  <name>elastic.exponential.backoff.millis</name>
  <value>100</value>
  <description>Initial delay for the BulkProcessor's exponential backoff policy.
  </description>
</property>

<property>
  <name>elastic.exponential.backoff.retries</name>
  <value>10</value>
  <description>Number of times the BulkProcessor's exponential backoff policy
  should retry bulk operations.</description>
</property>

0 个答案:

没有答案