早上好,
我来找你是因为Nutch (1.14)
和Solr (7.2)
所以在我放置SSL之前一切正常。
使用Solr in http,一旦爬网完成,我执行此命令
bin/nutch index -Dsolr.server.url=http://127.0.0.1:8983/solr/CORENAME crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
而且效果很好
但是,一旦激活SSL并使用HTTPS中的solr服务器,就无法将数据发送到Solr。 我在nutch网站上添加了以下属性
<name>solr.auth</name>
<value>true</value>
<property>
<name>solr.auth.username</name>
<value>xxxx</value>
<property>
<name>solr.auth.password</name>
<value>xxxx</value>
property>
<name>solr.server.type</name>
<value>https</value>
property>
<name>solr.server.url</name>
<value>https://127.0.0.1:8983/solr/CORENAME</value>
但是当我执行上一个命令时,我得到了这种类型的错误
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:8983/solr/CORENAME
&安培;
caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
&安培;
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
您是否成功将数据发送到HTTPS solr? 感谢
修改 要在SSL过程https://lucene.apache.org/solr/guide/7_0/enabling-ssl.html
之后修复此错误最后执行此操作
keytool -import -file /path/to/solr/solr-ssl.pem -alias solr_cert -keystore /path/to/java-cacert
(jre / lib / security / cacerts)
默认密码为changeit
答案 0 :(得分:0)
它前进了一点,一旦证书导入cacerts我没有任何更多这个错误。
仍在相同的上下文中,在solr服务器上激活SSL和身份验证之后。我使用Nutch来抓取网址并将数据发送到solr。 由于SSL的实现,我不能再向SOLR发送数据。
当我执行this bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/CORE -Dsolr.auth=true -Dsolr.auth.username='solr' -Dsolr.auth.password='xxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我有以下两个错误:
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/CORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/CORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/CORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/CORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
编辑: 第一个错误是由于身份验证错误。 填写正确的值后,我有一个我不明白的新错误。你有什么想法吗?
2018-06-20 09:47:18,116 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default
2018-06-20 09:47:19,151 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: content dest: content
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: title dest: title
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: host dest: host
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: segment dest: segment
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: boost dest: boost
2018-06-20 09:47:19,195 INFO solr.SolrMappingReader - source: digest dest: digest
2018-06-20 09:47:19,195 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2018-06-20 09:47:19,525 INFO solr.SolrIndexWriter - Indexing 250/250 documents
2018-06-20 09:47:19,525 INFO solr.SolrIndexWriter - Deleting 0 documents
2018-06-20 09:47:19,808 INFO solr.SolrIndexWriter - Indexing 250/250 documents
2018-06-20 09:47:19,809 INFO solr.SolrIndexWriter - Deleting 0 documents
2018-06-20 09:47:19,951 WARN mapred.LocalJobRunner - job_local146539832_0001
java.lang.Exception: java.io.IOException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:234)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:213)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:8983/solr/ESRF-EXTERNAL
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
... 16 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:146)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 20 more
2018-06-20 09:47:20,873 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
Stacktrace
name cpuTime / userTime
process reaper (37)
java.util.concurrent.SynchronousQueue$TransferStack@24197386
1.8587ms
0.0000ms
process reaper (36)
java.util.concurrent.SynchronousQueue$TransferStack@24197386
1.2672ms
0.0000ms
Scheduler-201556483 (31) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6202c0c1
1.1534ms
0.0000ms
searcherExecutor-7-thread-1 (30) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@663d5859
63.2030ms
50.0000ms
DestroyJavaVM (27)
1164.4748ms
1040.0000ms
Thread-12 (25)
java.lang.Object@233fcafa
0.1211ms
0.0000ms
Connection evictor (23)
0.9319ms
0.0000ms
Connection evictor (22)
2.0995ms
0.0000ms
org.eclipse.jetty.server.session.HashSessionManager@1a052a00Timer (21) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6626f9cf
4.2127ms
0.0000ms
qtp2012232625-20 (20) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
56.7955ms
50.0000ms
qtp2012232625-19 (19)
47.6864ms
40.0000ms
qtp2012232625-18 (18) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
79.3320ms
70.0000ms
qtp2012232625-17 (17) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
100.9593ms
90.0000ms
qtp2012232625-16-acceptor-0@2d033cc4-ServerConnector@23c4c714{SSL,[ssl, http/1.1]}{0.0.0.0:8983} (16)
4.5898ms
0.0000ms
qtp2012232625-15 (15) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
73.3096ms
60.0000ms
qtp2012232625-14 (14) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
18.7950ms
10.0000ms
qtp2012232625-13 (13)
79.7804ms
70.0000ms
qtp2012232625-12 (12)
70.2385ms
60.0000ms
qtp2012232625-11 (11) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
22.1012ms
10.0000ms
ShutdownMonitor (10)
0.3055ms
0.0000ms
Signal Dispatcher (5)
0.0873ms
0.0000ms
Finalizer (3)
java.lang.ref.ReferenceQueue$Lock@1e254491
8.2575ms
0.0000ms
Reference Handler (2)
java.lang.ref.Reference$Lock@431035b5
6.3846ms
0.0000ms
<强> EDIT2 强> 测试我禁用身份验证以查看问题是否来自https。如果没有身份验 我试图更改文件并将其包含在jetty-https.xml而不是jetty.xml中。
我有2个帐号配置就像这样
<security-constraint>
<web-resource-collection>
<web-resource-name>Solr authenticated application</web-resource-name>
<url-pattern>/</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>Test Realm</realm-name>
</login-config>
security.json
{
"authentication":{
"blockUnknown": true,
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"xxxx"}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{"name":"security-edit",
"role":"admin"}],
"user-role":{"solr":"admin"}
}}
当我执行以下命令时
bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/MYCORE -Dsolr.auth=true -Dsolr.auth.username='admin' -Dsolr.auth.password='xxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我收到此错误
java.lang.Exception: java.io.IOException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:234)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:213)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:8983/solr/MYCORE
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
... 16 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:146)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 20 more
2018-06-25 09:38:41,870 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
当我执行此
时bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/MYCORE -Dsolr.auth=true -Dsolr.auth.username='solr' -Dsolr.auth.password='xxxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我收到此错误
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:544)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-06-25 09:45:20,106 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
或现在:
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Service Unavailable</pre></p>
<hr />
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Service Unavailable</pre></p>
<hr />
</body>
</html>
Solr日志:
2018-06-25 14:18:44.352 INFO (main) [ ] o.e.j.s.Server jetty-9.3.20.v20170531
2018-06-25 14:18:44.597 WARN (main) [ ] o.e.j.w.WebAppContext Failed startup of context o.e.j.w.WebAppContext@5891e32e{/solr,file:///app/solr-7.2.1/server/solr-webapp/webapp/,UNAVAILABLE}{/app/solr-7.2.1/server/solr-webapp/webapp}
java.lang.IllegalStateException: No LoginService for org.eclipse.jetty.security.authentication.BasicAuthenticator@64c87930 in org.eclipse.jetty.security.ConstraintSecurityHandler@400cff1a
at org.eclipse.jetty.security.authentication.LoginAuthenticator.setConfiguration(LoginAuthenticator.java:76)
at org.eclipse.jetty.security.SecurityHandler.doStart(SecurityHandler.java:354)
at org.eclipse.jetty.security.ConstraintSecurityHandler.doStart(ConstraintSecurityHandler.java:448)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
at org.eclipse.jetty.server.session.SessionHandler.doStart(SessionHandler.java:116)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:809)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:345)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:561)
at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:422)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:389)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1520)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1442)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:215)
at org.eclipse.jetty.start.Main.start(Main.java:458)
at org.eclipse.jetty.start.Main.main(Main.java:76)
2018-06-25 14:18:44.745 INFO (main) [ ] o.e.j.s.Server Started @799ms
我解决了#34;没有登录服务&#34;通过在/ var / solr / data /(SOLR_HOME)上移动security.json
<强> EDIT3:强> 现在我只收到一条错误消息&#34; No allow&#34;当我想将nutch数据发送到solr时。此外,我不能再连接到管理界面我得到相同的错误。我认为它来自security.json文件
{
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"xxxxxx"}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin"
"permissions":[{"name":"security-edit","role":"adminRole"},{"name":"collection-admin-edit","role":"adminRole"},{"name":"update","role":"adminRole"},{"name":"all","role":"adminRole"},{"name":"core-admin-edit","role":"adminRole"},{"name":"read","role":"adminRole"},{"name":"config-edit","role":"adminRole"},{"name":"core-admin-read","role":"adminRole"},{"name":"core-admin-read","role":"adminRole"}]
"user-role":{"solr":"adminRole"}
}}
我做错了什么?感谢
答案 1 :(得分:0)
我添加了一个新答案,因为前一个答案太长了
已解决的API身份验证但不带NUTCH:
因此,为了通过API进行身份验证,我删除了jetty-https.xml
,webdefault.xml
中的配置,并删除了realm.properties
文件以及solr.in.sh
中的基本身份验证选项
我只处理SOLR HOME中的security.json
文件
事实上,最大的问题是我没有使用加密的密码来测试连接,而没有连接的话。另一方面,我仍然有不允许通过胡说八道的问题。
这是security.json
文件
`{
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{
"solr":"hzMjhfgN4b9X8KR0QgLB2Um3cUzqDzJygtEBL/O7g5E= CkP7HyXjYvqKNF3F4hBjnVvKGQOkLc/ta4FaNIkqgII="
}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
{
"name":"security-edit",
"role":"adminRole"
},
{
"name":"collection-admin-edit",
"role":"adminRole"
},
{
"name":"update",
"role":"adminRole"
},
{
"name":"config-edit",
"role":"adminRole"
},
{
"name":"core-admin-edit",
"role":"adminRole"
},
{
"name":"core-admin-read",
"role":"adminRole"
{
"name":"schema-edit",
"role":"adminRole"
},
{
"name":"all",
"role":"adminRole"
}
],
"user-role":{
"solr":"adminRole"
}
}
}
`
加密的密码表示值为“ test”
要测试文件代码,我建议使用此http://json.parser.online.fr/
我能错过什么才能通过螺母更新solr?
已解决 添加用于数据导入的更新角色路径
{
"name":"update",
"path":"/dataimport",
"role":"adminRole"
},
但是现在我可以索引到solr了,但是我在爬行时遇到了一个新错误...
`Thu Jun 28 09:21:03 CEST 2018 : Iteration 2 of 5
Generating a new segment
/app/nutch-external/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true crawl//crawldb crawl//segments -topN 50000 -numFetchers 1 -noFilter
Generator: starting at 2018-06-28 09:21:04
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 50000
Generator: 0 records selected for fetching, exiting ...
Generate returned 1 (no new segments created)
Escaping loop: no more URLs to fetch now
`
在第一次迭代中,我遇到了这个错误
`Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'name' = adminRole
No form element found with 'name' = adminRole
Supported authentication schemes in the order of preference: [ntlm, digest, basic]
Supported authentication schemes in the order of preference: [ntlm, digest, basic]
Challenge for ntlm authentication scheme not available
Challenge for ntlm authentication scheme not available
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'name' = adminRole
Failed to get protocol output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form exists: adminRole
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:506)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:183)
at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:276)
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:342)
Caused by: java.lang.IllegalArgumentException: No form exists: adminRole
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:504)
... 3 more
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
`
它仍然是security.json
文件。你有什么想法吗?谢谢
已解决
我通过在httpclient-auth.xml
/nutch/conf/
解决了这个问题
<auth-configuration>
<credentials username="solr" password="xxxxx">
<authscope host="localhost" port="8983"/>
</credentials>
</auth-configuration>
感谢您的帮助