使用Hazelcast处理TargetDisconnectedException

时间:2017-04-30 08:54:51

标签: hazelcast

当群集出现故障时,我遇到HazelcastClient(Java)问题。 Hazelcast的版本是客户端和集群的最后一个3.8.1

我定期执行以下代码

getMap().executeOnEntries(new MyProcessor<>(), Predicates.equal("field", var));

问题是当群集关闭时,hazelcast抛出的错误只会记录警告,但不会抛出异常:

2017-04-28 18:32:19,905 [WARN] from com.hazelcast.client.connection.ClientConnectionManager in hz.client_0.internal-1 - hz.client_0 [aa-api] [3.8.1] Heartbeat failed to connection : ClientConnection{alive=true, connectionId=1, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/xxx.xxx.4.125:49688 remote=/xxx.xxx.8.118:5701]}, remoteEndpoint=[xxx.xxx.8.118]:5701, lastReadTime=2017-04-28 18:31:15.445, lastWriteTime=2017-04-28 18:32:14.905, closedTime=never, lastHeartbeatRequested=2017-04-28 18:32:14.905, lastHeartbeatReceived=2017-04-28 18:31:14.905, connected server version=3.8.1}
2017-04-28 18:32:20,884 [WARN] from com.hazelcast.client.spi.ClientPartitionService in hz.client_0.internal-3 - hz.client_0 [aa-api] [3.8.1] Error while fetching cluster partition table!
java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Heartbeat timed out to owner connection ClientConnection{alive=true, connectionId=1, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/xxx.xxx.4.125:49688 remote=/xxx.xxx.8.118:5701]}, remoteEndpoint=[xxx.xxx.8.118]:5701, lastReadTime=2017-04-28 18:31:15.445, lastWriteTime=2017-04-28 18:32:14.905, closedTime=never, lastHeartbeatRequested=2017-04-28 18:32:14.905, lastHeartbeatReceived=2017-04-28 18:31:14.905, connected server version=3.8.1}
at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolve(ClientInvocationFuture.java:73)
at com.hazelcast.spi.impl.AbstractInvocationFuture$1.run(AbstractInvocationFuture.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at com.hazelcast.util.executor.LoggingScheduledExecutor$LoggingDelegatingFuture.run(LoggingScheduledExecutor.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92)
Caused by: com.hazelcast.spi.exception.TargetDisconnectedException: Heartbeat timed out to owner connection ClientConnection{alive=true, connectionId=1, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/xxx.xxx.4.125:49688 remote=/xxx.xxx.8.118:5701]}, remoteEndpoint=[xxx.xxx.8.118]:5701, lastReadTime=2017-04-28 18:31:15.445, lastWriteTime=2017-04-28 18:32:14.905, closedTime=never, lastHeartbeatRequested=2017-04-28 18:32:14.905, lastHeartbeatReceived=2017-04-28 18:31:14.905, connected server version=3.8.1}
at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport$CleanResourcesTask.notifyException(ClientInvocationServiceSupport.java:229)
at com.hazelcast.client.spi.impl.ClientInvocationServiceSupport$CleanResourcesTask.run(ClientInvocationServiceSupport.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
... 6 common frames omitted
Caused by: com.hazelcast.spi.exception.TargetDisconnectedException: Heartbeat timed out to owner connection ClientConnection{alive=true, connectionId=1, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/xxx.xxx.4.125:49688 remote=/xxx.xxx.8.118:5701]}, remoteEndpoint=[xxx.xxx.8.118]:5701, lastReadTime=2017-04-28 18:31:15.445, lastWriteTime=2017-04-28 18:32:14.905, closedTime=never, lastHeartbeatRequested=2017-04-28 18:32:14.905, lastHeartbeatReceived=2017-04-28 18:31:14.905, connected server version=3.8.1}
at com.hazelcast.client.spi.impl.ClusterListenerSupport.heartbeatStopped(ClusterListenerSupport.java:259)
at com.hazelcast.client.connection.nio.ClientConnectionManagerImpl$Heartbeat.fireHeartbeatStopped(ClientConnectionManagerImpl.java:503)
at com.hazelcast.client.connection.nio.ClientConnectionManagerImpl$Heartbeat.run(ClientConnectionManagerImpl.java:462)
... 10 common frames omitted
2017-04-28 18:32:22,904 [WARN] from com.hazelcast.client.connection.nio.ClientConnection in hz.client_0.internal-1 - hz.client_0 [aa-api] [3.8.1] ClientConnection{alive=false, connectionId=1, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/xxx.xxx.4.125:49688 remote=/xxx.xxx.8.118:5701]}, remoteEndpoint=[xxx.xxx.8.118]:5701, lastReadTime=2017-04-28 18:31:15.445, lastWriteTime=2017-04-28 18:32:14.905, closedTime=2017-04-28 18:32:19.905, lastHeartbeatRequested=2017-04-28 18:32:14.905, lastHeartbeatReceived=2017-04-28 18:31:14.905, connected server version=3.8.1} lost. Reason: com.hazelcast.spi.exception.TargetDisconnectedException[Heartbeat timed out to owner connection ClientConnection{alive=true, connectionId=1, socketChannel=DefaultSocketChannelWrapper{socketChannel=java.nio.channels.SocketChannel[connected local=/xxx.xxx.4.125:49688 remote=/xxx.xxx.8.118:5701]}, remoteEndpoint=[xxx.xxx.8.118]:5701, lastReadTime=2017-04-28 18:31:15.445, lastWriteTime=2017-04-28 18:32:14.905, closedTime=never, lastHeartbeatRequested=2017-04-28 18:32:14.905, lastHeartbeatReceived=2017-04-28 18:31:14.905, connected server version=3.8.1}]

如何处理此异常以便我采取行动?

谢谢,

编辑:当连接的节点断开连接时也会出现问题。客户端未连接到另一个节点(AWS Discovery)。

1 个答案:

答案 0 :(得分:2)

问题主要在于配置。一些超时和健康检查间隔太高。

Bellow,客户的默认属性:

  

hazelcast.client.heartbeat.interval = 10000ms

     

hazelcast.client.heartbeat.timeout = 300000ms

     

hazelcast.client.invocation.timeout.seconds = 120s

这是我的新价值

  

hazelcast.client.heartbeat.interval = 2000

     

hazelcast.client.heartbeat.timeout = 5000

     

hazelcast.client.invocation.timeout.seconds = 10

另外,我完全改变了我获取地图,主题以及更常见的hazelcast实例的方式。

在实例时

我处理每个异常(主要是扩展RuntimeException),并且我使用它通知每个类,实例现在可用。

try {
    hazelcastInstance = HazelcastClient.newHazelcastClient(config);
    eventListeners.forEach(HazelcastEventListener::onConnect);
} catch (Throwable e) {
    Logger.error(e.getMessage(), e);
    return null;
}

在每次使用实例的请求之前

我调用一个验证实例可用性的代码,如果发生错误,我会通过它通知每个类实例已关闭。

public boolean isClientActive() {
    if (getInstance() == null) {
        return false;
    }

    try {
        getMap("registration").isLocked("a");
    } catch (Throwable e) {
        hazelcastInstance = null;
        eventListeners.forEach(HazelcastEventListener::onDisconnect);
        return false;
    }

    return true;
}

会员离职时收到通知

// add a membership listener on the cluster
// to get notified when a member is removed
hazelcastInstance.getCluster().addMembershipListener(new MembershipListener() {
    @Override
    public void memberAdded(MembershipEvent membershipEvent) {}

    @Override
    public void memberRemoved(MembershipEvent membershipEvent) {
        if (membershipEvent.getMembers().isEmpty()) {
            restartInstance();
        }
    }

处理我的HazelcastEventListener

每个使用hazelcast的类都会注册一个eventListener

    hazelcastManager.addEventListener(new HazelcastEventListener() {
        @Override
        public void onConnect() {
            map = hazelcastManager.getMap(mapName);
        }

        @Override
        public void onDisconnect() {
            map = null;
        }
    });

重新连接hazelcast客户端

当hazelcastInstance为null时,调用getInstance()将尝试重新连接。

<强>问题

它避免了许多错误,但还有一些工作要做,以管理并发问题。 实际上,我认为这个解决方案是一种解决方法,因为它不是非常有效,而且主要是关于Hazelcast中缺少功能的补丁。

这就是为什么我不会“接受”这个解决方案。如果有人有更好的解决方案,请告诉我们。