今天我们的一个Web应用程序服务器崩溃了(不是Hazelcast相关的), 这次崩溃也使得所有其他Hazelcast集群也停止了几分钟。 我们使用Hazelcast进行会话复制,缓存和分布式统计。
当一个群集崩溃时,如何最小化影响?
我们的筹码:
Hazelcast 3.7.4
Spring Boot 1.5.1.RELEASE
Spring Framework 4.3.6.RELEASE
Spring Websocket 4.3.6.RELEASE
Apache Tomcat 8.5.11
Java 1.8 112
Windows Server 2012 R2
淡褐色群集的日志:
2017-07-07 14:41:20.177 ERROR 1600 --- [https-jsse-nio-8443-exec-20] c.p.p.r.GlobalControllerExceptionHandler : Unhandled Exception, null
com.hazelcast.core.OperationTimeoutException: GetOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2017-07-07 14:41:20.177. Total elapsed time: 121392 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2017-07-07 14:39:09.296. Invocation{op=com.hazelcast.map.impl.operation.GetOperation{serviceName='hz:impl:mapService', identityHash=790688984, partitionId=11, replicaIndex=0, callId=0, invocationTime=1499438359178 (2017-07-07 14:39:19.178), waitTimeout=-1, callTimeout=60000, name=subscription-type-by-subscription-level}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1499438358785, firstInvocationTime='2017-07-07 14:39:18.785', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[10.0.0.4]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=6, /10.0.0.7:5702->/10.0.0.4:49193, endpoint=[10.0.0.4]:5702, alive=true, type=MEMBER]}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:150)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:98)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrow(InvocationFuture.java:74)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:158)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:376)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:307)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:94)
...
2017-07-07 14:41:40.181 WARN 1600 --- [https-jsse-nio-8443-exec-32] c.h.map.impl.query.MapQueryEngineImpl : [10.0.0.7]:5702 [app-v19] [3.7.4] Could not get results
java.util.concurrent.ExecutionException: QueryOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2017-07-07 14:41:40.181. Total elapsed time: 120624 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2017-07-07 14:39:09.296. Invocation{op=com.hazelcast.map.impl.query.QueryOperation{serviceName='hz:impl:mapService', identityHash=2056544024, partitionId=-1, replicaIndex=0, callId=0, invocationTime=1499438379950 (2017-07-07 14:39:39.950), waitTimeout=-1, callTimeout=60000, name=online-messaging-session-containers}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=60000, firstInvocationTimeMs=1499438379557, firstInvocationTime='2017-07-07 14:39:39.557', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 00:00:00.000', target=[10.0.0.4]:5702, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=6, /10.0.0.7:5702->/10.0.0.4:49193, endpoint=[10.0.0.4]:5702, alive=true, type=MEMBER]}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:150)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:98)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrow(InvocationFuture.java:74)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:158)
...
2017-07-07 14:42:58.385 WARN 1600 --- [https-jsse-nio-8443-exec-191] c.h.map.impl.query.MapQueryEngineImpl : [10.0.0.7]:5702 [app-v19] [3.7.4] Could not get results
com.hazelcast.core.MemberLeftException: Member [10.0.0.4]:5702 - 14a2452c-45bc-40c9-bf77-ce4d73bf6f7e has left cluster!
at com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor$OnMemberLeftTask.run0(InvocationMonitor.java:379)
at com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor$MonitorTask.run(InvocationMonitor.java:221)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)