我的AWS VPC中有一个带有2个节点的ES集群。在这些节点之上,我有一个负载平衡器。在同一vpc中,我有一个微服务,可通过RestHighLevelClient
版本7.5.2访问Elasticsearch。
我以以下方式创建客户端:
public class ESClientWrapper {
@Getter
private RestHighLevelClient client;
public ESClientWrapper() throws IOException {
FileInputStream propertiesFile = new FileInputStream("/var/elastic.properties");
Properties properties = new Properties();
properties.load(propertiesFile );
RestClientBuilder builder = RestClient.builder(new HttpHost(
properties .getProperty("host"),
Integer.parseInt(properties.getProperty("port"))
));
this.client = new RestHighLevelClient(builder);
}
}
当我的微服务很长时间(12h ..)没有收到请求时,会发生以下情况:发送的第一个响应(或之后的一些..)出现以下错误:
2020-09-09 07:03:13.106 INFO 1 --- [nio-8080-exec-1] c.a.a.services.CustomersMetadataService : Trying to add the following role : {role=a2}
2020-09-09 07:03:13.106 INFO 1 --- [nio-8080-exec-1] c.a.a.e.repositories.ESRepository : Trying to insert the following document to app-index : {role=a2}
2020-09-09 07:03:13.109 ERROR 1 --- [nio-8080-exec-1] c.a.a.e.dal.ESRepository : Failed to add customer : {role=a2}
java.io.IOException: Connection reset by peer
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:871) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
....
....
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.35.jar!/:9.0.35]
at java.base/java.lang.Thread.run(Thread.java:836) ~[na:na]
Caused by: java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:na]
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:na]
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[na:na]
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[na:na]
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[na:na]
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358) ~[na:na]
at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:231) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
... 1 common frames omitted
2020-09-09 07:06:55.109 INFO 1 --- [nio-8080-exec-2] c.a.a.services.MyService : Trying to add the following role : {role=a2}
2020-09-09 07:06:55.109 INFO 1 --- [nio-8080-exec-2] c.a.a.e.repositories.ESRepository : Trying to insert the following document to index app-index: {role=a2}
2020-09-09 07:06:55.211 INFO 1 --- [nio-8080-exec-2] c.a.a.e.dal.ESRepository : IndexResponse[index=app-index,type=_doc,id=x532323272533321870287,version=1,result=created,seqNo=70,primaryTerm=1,shards={"total":2,"successful":2,"failed":0}]
如您所见,在失败的请求后3分钟,ES成功处理了下一个请求。有什么可以杀死请求?我检查了Elasticsearch日志,未发现任何终止连接的迹象。 MS与Elastic处于同一vpc中,因此不会穿过任何可能杀死它的防火墙。
我在github中发现了following issue,建议增加默认的连接超时,但是我想知道这里的问题是否真的是超时问题,而增加默认的时间是否真的是最好的解决方案。 >
此外,我发现this bug在有关相同问题但没有任何答案的仓库中打开。
更新 我注意到即使10分钟后我的服务就启动了。我的服务启动并将查询发送给ES,一切正常。 10分钟后,我发送了插入请求,但在对等方重置连接后失败。
答案 0 :(得分:0)
最后,我在配置/实现中没有发现问题。在Elasticsearch的RestHighLevelClient的实现中,它看起来像是bug。
我实现了一个重试机制,该机制包装了RestHighLevelClient并在遇到相同错误时重试查询。我将Spring @Retry注释用于此解决方案。