为什么即使设置了solr.server.url参数,索引也指向错误的Solr集合?

时间:2019-03-23 06:51:59

标签: solr nutch

将nutch 1.15与solr8.0集成在一起,但是当我使用以下命令时

nutch/bin/crawl -i -D solr.server.url=http://192.168.199.109:8983/solr/csdn -s ./csdn-seed/ ./data/csdn 1

索引从nnut爬行到solr的数据,它抛出hadoop.log中的异常

2019-03-23 02:03:07,491 WARN  mapred.LocalJobRunner - job_local1877827743_0001
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/nutch/update. Reason:
<pre>    Not Found</pre></p>
</body>
</html>

    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/nutch/update. Reason:
<pre>    Not Found</pre></p>
</body>
</html>

但是实际上,我将solr.server.url设置为 / solr / csdn ,不是吗?但是为什么它告诉我它正在索引到 / solr / nutch

1 个答案:

答案 0 :(得分:0)

使用Nutch 1.15更改了索引器插件的配置方式:现在,所有索引器插件都在单个XML文件(conf / index-writers.xml)中进行配置,无法再通过Nutch属性设置或覆盖配置参数。 请参见https://wiki.apache.org/nutch/IndexWriters,如何配置Solr服务器URL。为了允许多个相同类型的索引器(例如,多个Solr实例。