Question

我有一个Spark作业，该作业使用大小为1000的连接池并调用服务。该池是每个执行者使用的，因此可以共享和重用连接。池正在使用PoolingHttpClientConnectionManager

当spark进入调用服务85个任务的阶段时，它会平稳移动，直到最后一个任务卡住一个小时以上，并且我可以看到错误

Connection to wn40-tsl01.ljeffgae40cedhvccpi1rosolb.cx.internal.cloudapp.net/10.115.52.174:37893 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.

，我也对执行者使用ssh，发现在TIME_WAIT状态下有大约50个到数据库的出站连接，并且这些连接都是通过keep-alive标头创建的。

如何更改池，以免由于TIME_WAIT连接而导致火花卡死？

使用

创建池

PoolingHttpClientConnectionManager connectionManager = new      PoolingHttpClientConnectionManager();
    connectionManager.setMaxTotal(maxPoolSize);
    connectionManager.setDefaultMaxPerRoute(maxPoolSize);
    connectionManager.setValidateAfterInactivity(10000);
    connectionManager.setDefaultSocketConfig(SocketConfig.custom()
            .setSoTimeout(60000)
            .setTcpNoDelay(true)
            .build());

并使用

创建共享HttpClient。

HttpClientBuilder httpClientBuilder = HttpClients.custom()
                .setConnectionManager(connectionManager)
                .disableAutomaticRetries()
                .disableRedirectHandling()
                .disableCookieManagement()
                .setDefaultRequestConfig(RequestConfig.custom()
                        .setSocketTimeout(60000)
                        .setConnectTimeout(60000)
                        .setConnectionRequestTimeout(60000)
                        .build())
                .setDefaultSocketConfig(SocketConfig.custom()
                        .setSoTimeout(60000)
                        .build());

Spark执行者长期从事一项工作，最终成功

0 个答案: