在日食中运行nutch中的索引器作业时出现错误“缺少elastic.cluster和elastic.host ....”

时间:2017-10-05 07:45:54

标签: eclipse solr web-scraping web-crawler nutch

我在eclipse中配置了apache nutch 1.13和solr 5.5.0以及hbase 0.90.6。现在,我能够从注入器到反向链接运行作业,但在运行索引作业时,它会抛出错误“Missing elastic.cluster and elastic.host ....”。我在nutch-site.xml文件中的plugin.includes下设置了indexer-solr。但仍然得到这些错误。任何人都能帮助我为什么会这样吗?

1 个答案:

答案 0 :(得分:0)

问题在于nutch-site.xml。如果你看到有两个nutch-site.xml;一个在conf文件夹下,另一个在src / test文件夹中。我们通常在conf文件夹下配置nutch-site.xml文件,但是当我们在eclipse中导入它时,它会将该文件视为src / test文件夹下的文件。因此,修复此错误的方法是在src / test文件夹下配置您的设置。通常该文件包含非常基本的配置,您需要替换

<property>
    <name>plugin.includes</name>
    <value>.*</value>
    <description>Enable all plugins during unit testing.</description>
</property>

以下行

<property>
    <name>plugin.includes</name>
    <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
    <description>Regular expression naming plugin directory names to
    include.  Any plugin not matching this expression is excluded.
    In any case you need at least include the nutch-extensionpoints plugin. By
    default Nutch includes crawling just HTML and plain text via HTTP,
    and basic indexing and search plugins. In order to use HTTPS please enable 
    protocol-httpclient, but be aware of possible intermittent problems with the 
    underlying commons-httpclient library. Set parsefilter-naivebayes for classification based focused crawler.
    </description>
</property>

因此,如果你想使用solr,那么使用indexer-solr,弹性然后索引弹性等等。

希望这有助于他人。