我按照nutch教程http://wiki.apache.org/nutch/NutchTutorial运行了nutch crawler但是当我开始将它加载到solr时我收到了这条消息,即“没有激活IndexWriters - 检查你的配置”
bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -dir crawl/segments/
Indexer: starting at 2013-07-15 08:09:13
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
**No IndexWriters activated - check your configuration**
Indexer: finished at 2013-07-15 08:09:21, elapsed: 00:00:07
答案 0 :(得分:6)
确保包含插件indexer-solr
。转到文件:conf/nutch-site.xml
并在属性plugin.includes
中添加插件,例如:
协议HTTP | urlfilter正则表达式| parse-(HTML | TIKA)|索引 - (基本|锚)|索引-solr的|记分OPIC | urlnormalizer-(传递|正则表达式|基本)
添加插件后,No IndexWriters activated - check your configuration
警告在我的情况下消失了。
检查此主题:http://lucene.472066.n3.nabble.com/a-plugin-extending-IndexWriter-td4074353.html
答案 1 :(得分:2)
@Tryskele + @ Scott101为我工作:
将plugin.includes属性添加到/conf/nutch-site.xml和runtime / local / conf / nutch-site.xml文件中:
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value>
</property>
答案 2 :(得分:0)
不知道这仍然是一个问题,但我遇到了这个问题,然后意识到我的src/plugin/build.xml
错过了indexer-solr
插件。添加以下内容然后重新编译nutch为我修复它:
<ant dir="indexer-solr" target="deploy"/>
答案 3 :(得分:0)
在conf / nutch-site.xml中添加以下属性以获取插件
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value>
</property>
如果它能解决您的问题,请告诉我。