我在eclipse中配置了apache nutch 1.13和solr 5.5.0以及hbase 0.90.6。现在,我能够从注入器到反向链接运行作业,但在运行索引作业时,它会抛出错误“Missing elastic.cluster and elastic.host ....”。我在nutch-site.xml文件中的plugin.includes下设置了indexer-solr。但仍然得到这些错误。任何人都能帮助我为什么会这样吗?
答案 0 :(得分:0)
问题在于nutch-site.xml。如果你看到有两个nutch-site.xml;一个在conf文件夹下,另一个在src / test文件夹中。我们通常在conf文件夹下配置nutch-site.xml文件,但是当我们在eclipse中导入它时,它会将该文件视为src / test文件夹下的文件。因此,修复此错误的方法是在src / test文件夹下配置您的设置。通常该文件包含非常基本的配置,您需要替换
<property>
<name>plugin.includes</name>
<value>.*</value>
<description>Enable all plugins during unit testing.</description>
</property>
以下行
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins. In order to use HTTPS please enable
protocol-httpclient, but be aware of possible intermittent problems with the
underlying commons-httpclient library. Set parsefilter-naivebayes for classification based focused crawler.
</description>
</property>
因此,如果你想使用solr,那么使用indexer-solr,弹性然后索引弹性等等。
希望这有助于他人。