Elasticsearch RestClient连接被对等方重置

时间:2020-09-09 08:20:15

标签: java amazon-web-services elasticsearch

我的AWS VPC中有一个带有2个节点的ES集群。在这些节点之上,我有一个负载平衡器。在同一vpc中,我有一个微服务,可通过RestHighLevelClient版本7.5.2访问Elasticsearch。

我以以下方式创建客户端:

public class ESClientWrapper {

    @Getter
    private RestHighLevelClient client;

    public ESClientWrapper() throws IOException {
        FileInputStream propertiesFile = new FileInputStream("/var/elastic.properties");
        Properties properties = new Properties();
        properties.load(propertiesFile );
        RestClientBuilder builder = RestClient.builder(new HttpHost(
                properties .getProperty("host"),
                Integer.parseInt(properties.getProperty("port"))
        ));

        this.client = new RestHighLevelClient(builder);
    }
}

当我的微服务很长时间(12h ..)没有收到请求时,会发生以下情况:发送的第一个响应(或之后的一些..)出现以下错误:

    2020-09-09 07:03:13.106  INFO 1 --- [nio-8080-exec-1] c.a.a.services.CustomersMetadataService  : Trying to add the following role : {role=a2}
2020-09-09 07:03:13.106  INFO 1 --- [nio-8080-exec-1] c.a.a.e.repositories.ESRepository        : Trying to insert the following document to app-index : {role=a2}
2020-09-09 07:03:13.109 ERROR 1 --- [nio-8080-exec-1] c.a.a.e.dal.ESRepository       : Failed to add customer : {role=a2}


java.io.IOException: Connection reset by peer
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
        at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:871) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
    ....
    ....
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.35.jar!/:9.0.35]
        at java.base/java.lang.Thread.run(Thread.java:836) ~[na:na]
Caused by: java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:na]
        at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:na]
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[na:na]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[na:na]
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[na:na]
        at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358) ~[na:na]
        at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:231) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
        ... 1 common frames omitted

2020-09-09 07:06:55.109  INFO 1 --- [nio-8080-exec-2] c.a.a.services.MyService  : Trying to add the following role : {role=a2}
2020-09-09 07:06:55.109  INFO 1 --- [nio-8080-exec-2] c.a.a.e.repositories.ESRepository        : Trying to insert the following document to index app-index: {role=a2}
2020-09-09 07:06:55.211  INFO 1 --- [nio-8080-exec-2] c.a.a.e.dal.ESRepository       : IndexResponse[index=app-index,type=_doc,id=x532323272533321870287,version=1,result=created,seqNo=70,primaryTerm=1,shards={"total":2,"successful":2,"failed":0}]

如您所见,在失败的请求后3分钟,ES成功处理了下一个请求。有什么可以杀死请求?我检查了Elasticsearch日志,未发现任何终止连接的迹象。 MS与Elastic处于同一vpc中,因此不会穿过任何可能杀死它的防火墙。

我在github中发现了following issue,建议增加默认的连接超时,但是我想知道这里的问题是否真的是超时问题,而增加默认的时间是否真的是最好的解决方案。 >

此外,我发现this bug在有关相同问题但没有任何答案的仓库中打开。

更新 我注意到即使10分钟后我的服务就启动了。我的服务启动并将查询发送给ES,一切正常。 10分钟后,我发送了插入请求,但在对等方重置连接后失败。

1 个答案:

答案 0 :(得分:0)

最后,我在配置/实现中没有发现问题。在Elasticsearch的RestHighLevelClient的实现中,它看起来像是bug

我实现了一个重试机制,该机制包装了RestHighLevelClient并在遇到相同错误时重试查询。我将Spring @Retry注释用于此解决方案。