我已基于Nutch教程https://cwiki.apache.org/confluence/display/nutch/NutchTutorial逐步集成了Nutch 1.16和Solr 7.3.1。我没有集成Hadoop,也没有编辑indexwriters.xml。
我已经将段插入到Solr中,索引似乎很好。
bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20191119102434
Segment dir is complete: crawl/segments/20191119102434.
Indexer: starting at 2019-11-26 11:59:18
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
No exchange was configured. The documents will be routed to all index writers.
Active IndexWriters :
SolrIndexWriter:
Indexer: number of documents indexed, deleted, or skipped:
Indexer: finished at 2019-11-26 11:59:23, elapsed: 00:00:05
但是当我在Solr端口检查查询时,我的核心中没有找到任何文档。我也尝试过:
bin/crawl -i -s urls/ TestCrawl/ 2
什么都没有改变;相同的结果。 我不确定是Nutch还是Solr。
谢谢。