Question

我在eclipse中配置了apache nutch 1.13和solr 5.5.0以及hbase 0.90.6。现在，我能够从注入器到反向链接运行作业，但在运行索引作业时，它会抛出错误“Missing elastic.cluster and elastic.host ....”。我在nutch-site.xml文件中的plugin.includes下设置了indexer-solr。但仍然得到这些错误。任何人都能帮助我为什么会这样吗？

Answer 1

问题在于nutch-site.xml。如果你看到有两个nutch-site.xml;一个在conf文件夹下，另一个在src / test文件夹中。我们通常在conf文件夹下配置nutch-site.xml文件，但是当我们在eclipse中导入它时，它会将该文件视为src / test文件夹下的文件。因此，修复此错误的方法是在src / test文件夹下配置您的设置。通常该文件包含非常基本的配置，您需要替换

<property>
    <name>plugin.includes</name>
    <value>.*</value>
    <description>Enable all plugins during unit testing.</description>
</property>

以下行

<property>
    <name>plugin.includes</name>
    <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
    <description>Regular expression naming plugin directory names to
    include.  Any plugin not matching this expression is excluded.
    In any case you need at least include the nutch-extensionpoints plugin. By
    default Nutch includes crawling just HTML and plain text via HTTP,
    and basic indexing and search plugins. In order to use HTTPS please enable 
    protocol-httpclient, but be aware of possible intermittent problems with the 
    underlying commons-httpclient library. Set parsefilter-naivebayes for classification based focused crawler.
    </description>
</property>

因此，如果你想使用solr，那么使用indexer-solr，弹性然后索引弹性等等。

希望这有助于他人。

在日食中运行nutch中的索引器作业时出现错误“缺少elastic.cluster和elastic.host ....”

1 个答案: